Skip to content

Add High Revenue, Low Profit Analysis notebook#11

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/add-high-revenue-low-profit-analysis-again
Draft

Add High Revenue, Low Profit Analysis notebook#11
Copilot wants to merge 3 commits intomainfrom
copilot/add-high-revenue-low-profit-analysis-again

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 5, 2026

Adds a notebook to identify products and sub-categories generating high sales but low/negative profit—signals for pricing, discount, or cost issues.

Implementation

  • Auto-detection: Uses glob to find first CSV in data/ directory
  • Flexible schema mapping: Handles column name variations (Product Name vs Product, Sub-Category vs SubCategory)
  • Configurable thresholds: Top 20% sales + (negative profit OR bottom 20% profit) marks items for review
  • Aggregation levels: Analyzes both product and sub-category totals
  • Visualization: Scatter plots (sales vs profit) with flagged item annotations
  • Export: Writes flagged datasets to notebooks/outputs/ as CSV

Example Usage

# Notebook auto-detects CSV, maps columns flexibly
product_col = find_col(["Product Name", "Product"])
subcat_col = find_col(["Sub-Category", "SubCategory", "Sub Category"])

# Flags high-revenue, low-profit items
prod_agg["flagged_strong"] = (
    prod_agg["total_sales"] >= threshold_sales
) & (
    prod_agg["total_profit"] <= 0
)

Includes actionable recommendations: review discounts, COGS, pricing strategy for flagged items.

Original prompt

Create a new Jupyter notebook file at notebooks/high_revenue_low_profit_analysis.ipynb in the repository WebCraftPhil/superstore-sales-analysis. The notebook must implement an analysis to identify "High Revenue, Low Profit" products and sub-categories and include explanatory markdown. Do not modify other files.

Notebook content (cells in order):

  1. Markdown: Title & Goal

High Revenue, Low Profit Analysis (Very Strong Signal)

Goal

  • Identify products and sub-categories that generate high total sales but have low or negative profit.
  • These are strong signals for pricing, discounting, or cost issues.

What this notebook does

  • Load sales data from the data/ directory.
  • Clean and prepare the data.
  • Group by Product and Sub-Category to compute total Sales and Profit.
  • Flag items with high sales but low/negative profits.
  • Visualize and export flagged lists.
  1. Code: Imports & setup
    import os
    import glob
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns

%matplotlib inline # optional in Jupyter

sns.set(style="whitegrid")
pd.set_option("display.max_columns", 80)
pd.set_option("display.width", 120)

  1. Code: Auto-detect CSV and load
    data_dir = "data"
    csv_files = glob.glob(os.path.join(data_dir, "*.csv"))
    if not csv_files:
    raise FileNotFoundError(f"No CSV files found in '{data_dir}'. Place your sales CSV in that folder.")

data_path = csv_files[0]
print("Using data file:", data_path)

df = pd.read_csv(data_path, low_memory=False)
print("Rows, columns:", df.shape)
df.head()

  1. Markdown: Notes on columns
    Notes on columns
  • This notebook expects the dataset to contain at least columns similar to: Product Name (or Product), Sub-Category (or SubCategory), Sales, and Profit.
  • If your column names differ, update the mapping in the next cell.
  1. Code: Flexible column mapping & cleaning
    cols = {c.lower(): c for c in df.columns}

def find_col(possible_names):
for name in possible_names:
if name.lower() in cols:
return cols[name.lower()]
return None

product_col = find_col(["Product Name", "Product"])
subcat_col = find_col(["Sub-Category", "SubCategory", "Sub Category"])
sales_col = find_col(["Sales"])
profit_col = find_col(["Profit"])

print("Detected columns -> Product:", product_col, "Sub-Category:", subcat_col, "Sales:", sales_col, "Profit:", profit_col)

if sales_col is None or profit_col is None:
raise ValueError("Sales and/or Profit column not found. Rename columns or update mapping.")

Convert to numeric

df[sales_col] = pd.to_numeric(df[sales_col], errors="coerce")
df[profit_col] = pd.to_numeric(df[profit_col], errors="coerce")

Drop rows without sales or profit info

df_clean = df.dropna(subset=[sales_col, profit_col]).copy()
print("After dropping NA sales/profit:", df_clean.shape)

df_clean[[c for c in [product_col, subcat_col, sales_col, profit_col] if c is not None]].head()

  1. Markdown: Aggregation approach
    Aggregation
  • We'll compute total Sales and total Profit for each Product (if Product column exists) and for each Sub-Category.
  • We'll compute thresholds for "high sales" using percentile (default: top 20%) and define "low profit" as profit <= 0 or bottom 20% by profit. Both options are shown so you can choose which is more appropriate.
  1. Code: Group by Product
    if product_col:
    prod_agg = (
    df_clean.groupby(product_col)
    .agg(total_sales=(sales_col, "sum"), total_profit=(profit_col, "sum"), n_orders=(sales_col, "count"))
    .reset_index()
    .sort_values("total_sales", ascending=False)
    )
    prod_agg.head(10)
    else:
    prod_agg = pd.DataFrame()
    print("Product column not available; skipping product-level aggregation.")

  2. Code: Group by Sub-Category
    if subcat_col:
    subcat_agg = (
    df_clean.groupby(subcat_col)
    .agg(total_sales=(sales_col, "sum"), total_profit=(profit_col, "sum"), n_orders=(sales_col, "count"))
    .reset_index()
    .sort_values("total_sales", ascending=False)
    )
    subcat_agg.head(10)
    else:
    subcat_agg = pd.DataFrame()
    print("Sub-Category column not available; skipping sub-category aggregation.")

  3. Markdown: Flagging logic
    Flagging logic (configurable)

  • High sales threshold: products/sub-categories in the top 20% by total sales (configurable).
  • Low profit threshold: (A) total_profit <= 0 (strict), and/or (B) bottom 20% by profit (configurable).
  • We'll create three flags:
    • high_sales_flag (boolean)
    • low_or_negative_profit_flag (profit <= 0)
    • flagged_strong (high_sales_flag & low_or_negative_profit_flag)
      You can adjust percentiles (e.g., top 10% instead of top 20%).
  1. Code: Compute thresholds & flags
    high_sales_pct = 0.80 # top 20% considered high sales
    low_profit_pct = 0.20 # bottom 20% considered low profit

def flag_agg(df_agg):
if df_agg.empty:
return df_agg, None, None
sales_th = df_agg["total_sales"].quantile(high_sales_pct)
profit_low_th = df_agg["total_profit"].q...

This pull request was created from Copilot chat.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits January 5, 2026 16:31
Co-authored-by: WebCraftPhil <118385120+WebCraftPhil@users.noreply.github.com>
Co-authored-by: WebCraftPhil <118385120+WebCraftPhil@users.noreply.github.com>
Copilot AI changed the title [WIP] Add high revenue low profit analysis notebook Add High Revenue, Low Profit Analysis notebook Jan 5, 2026
Copilot AI requested a review from WebCraftPhil January 5, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants