Draft
Conversation
Co-authored-by: WebCraftPhil <118385120+WebCraftPhil@users.noreply.github.com>
Co-authored-by: WebCraftPhil <118385120+WebCraftPhil@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add high revenue low profit analysis notebook
Add High Revenue, Low Profit Analysis notebook
Jan 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a notebook to identify products and sub-categories generating high sales but low/negative profit—signals for pricing, discount, or cost issues.
Implementation
globto find first CSV indata/directoryProduct NamevsProduct,Sub-CategoryvsSubCategory)notebooks/outputs/as CSVExample Usage
Includes actionable recommendations: review discounts, COGS, pricing strategy for flagged items.
Original prompt
Create a new Jupyter notebook file at notebooks/high_revenue_low_profit_analysis.ipynb in the repository WebCraftPhil/superstore-sales-analysis. The notebook must implement an analysis to identify "High Revenue, Low Profit" products and sub-categories and include explanatory markdown. Do not modify other files.
Notebook content (cells in order):
High Revenue, Low Profit Analysis (Very Strong Signal)
Goal
What this notebook does
data/directory.import os
import glob
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline # optional in Jupyter
sns.set(style="whitegrid")
pd.set_option("display.max_columns", 80)
pd.set_option("display.width", 120)
data_dir = "data"
csv_files = glob.glob(os.path.join(data_dir, "*.csv"))
if not csv_files:
raise FileNotFoundError(f"No CSV files found in '{data_dir}'. Place your sales CSV in that folder.")
data_path = csv_files[0]
print("Using data file:", data_path)
df = pd.read_csv(data_path, low_memory=False)
print("Rows, columns:", df.shape)
df.head()
Notes on columns
Product Name(orProduct),Sub-Category(orSubCategory),Sales, andProfit.cols = {c.lower(): c for c in df.columns}
def find_col(possible_names):
for name in possible_names:
if name.lower() in cols:
return cols[name.lower()]
return None
product_col = find_col(["Product Name", "Product"])
subcat_col = find_col(["Sub-Category", "SubCategory", "Sub Category"])
sales_col = find_col(["Sales"])
profit_col = find_col(["Profit"])
print("Detected columns -> Product:", product_col, "Sub-Category:", subcat_col, "Sales:", sales_col, "Profit:", profit_col)
if sales_col is None or profit_col is None:
raise ValueError("Sales and/or Profit column not found. Rename columns or update mapping.")
Convert to numeric
df[sales_col] = pd.to_numeric(df[sales_col], errors="coerce")
df[profit_col] = pd.to_numeric(df[profit_col], errors="coerce")
Drop rows without sales or profit info
df_clean = df.dropna(subset=[sales_col, profit_col]).copy()
print("After dropping NA sales/profit:", df_clean.shape)
df_clean[[c for c in [product_col, subcat_col, sales_col, profit_col] if c is not None]].head()
Aggregation
Code: Group by Product
if product_col:
prod_agg = (
df_clean.groupby(product_col)
.agg(total_sales=(sales_col, "sum"), total_profit=(profit_col, "sum"), n_orders=(sales_col, "count"))
.reset_index()
.sort_values("total_sales", ascending=False)
)
prod_agg.head(10)
else:
prod_agg = pd.DataFrame()
print("Product column not available; skipping product-level aggregation.")
Code: Group by Sub-Category
if subcat_col:
subcat_agg = (
df_clean.groupby(subcat_col)
.agg(total_sales=(sales_col, "sum"), total_profit=(profit_col, "sum"), n_orders=(sales_col, "count"))
.reset_index()
.sort_values("total_sales", ascending=False)
)
subcat_agg.head(10)
else:
subcat_agg = pd.DataFrame()
print("Sub-Category column not available; skipping sub-category aggregation.")
Markdown: Flagging logic
Flagging logic (configurable)
You can adjust percentiles (e.g., top 10% instead of top 20%).
high_sales_pct = 0.80 # top 20% considered high sales
low_profit_pct = 0.20 # bottom 20% considered low profit
def flag_agg(df_agg):
if df_agg.empty:
return df_agg, None, None
sales_th = df_agg["total_sales"].quantile(high_sales_pct)
profit_low_th = df_agg["total_profit"].q...
This pull request was created from Copilot chat.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.