This project develops a comprehensive Consumer Complaint Intelligence Platform designed to analyze real public consumer complaint data related to financial products. Leveraging data directly collected from the Consumer Financial Protection Bureau (CFPB) and enriched with demographic information, the platform identifies high-risk products, companies, and regions, uncovers common issues, and analyzes response patterns. The goal is to provide actionable insights for improving operational efficiency, ensuring compliance, and enhancing customer experience within the financial sector.
Financial institutions and regulatory bodies face significant challenges in proactively identifying and addressing systemic issues within financial products. Consumer complaints, while valuable, are often disparate and difficult to analyze at scale. This project addresses the need for a robust intelligence platform that can transform raw complaint data into actionable insights, serving as an early warning system for potential product or service risks. By understanding the landscape of consumer grievances, stakeholders can make data-driven decisions to mitigate risks, improve product offerings, and foster greater consumer trust.
The main objectives of this project are to:
- Identify High-Risk Areas: Pinpoint specific financial products, companies, and geographical regions with disproportionately high complaint volumes or rates to inform targeted interventions.
- Uncover Underlying Issues: Utilize Natural Language Processing (NLP) to extract common themes and root causes from consumer complaint narratives, providing qualitative insights into systemic problems.
- Assess Response Efficiency: Analyze company response times and resolution patterns to identify bottlenecks and opportunities for improving customer service and compliance with regulatory standards.
- Provide Actionable Insights: Transform raw complaint data into clear, concise, and actionable insights for stakeholders, enabling data-driven decision-making for risk mitigation and product enhancement.
- Develop an Interactive Monitoring Tool: Create a dynamic dashboard for continuous monitoring of key performance indicators (KPIs) and emerging trends in consumer complaints, serving as an early warning system.
- CFPB Consumer Complaint Database: Real, publicly available data directly collected from the Consumer Financial Protection Bureau, detailing complaints about various financial products and services. The dataset includes fields such as
date_received,product,issue,company,state,consumer_complaint_narrative,submitted_via,company_response_to_consumer,timely_response?, andconsumer_disputed?. - US State Population Data: Real state-level population data used to normalize complaint volumes and calculate complaint rates per 100,000 residents.
- Programming Language: Python
- Data Manipulation & Analysis: Pandas, NumPy
- Data Visualization: Matplotlib, Seaborn, Plotly, Plotly Express
- Machine Learning (NLP): Scikit-learn (for TF-IDF and LDA)
- Interactive Dashboards: Plotly Dash
- Version Control: Git, GitHub
The project follows a structured data analysis pipeline:
- Data Collection: Real CFPB complaint data and US state population data were collected to form the basis of this analysis. (Script:
src/data_collection.py) - Data Cleaning & Preprocessing: Raw data underwent cleaning, including date standardization, handling missing values, text cleaning of narratives, and creation of derived features like
response_delay_daysandcomplaint_age_days. (Script:src/data_cleaning.py) - Data Enrichment: Complaint data was enriched by merging with population data to calculate
complaint_rate_per_100k. (Script:src/data_cleaning.py) - Exploratory Data Analysis (EDA) & KPI Computation: Key Performance Indicators (KPIs) such as Total Complaints, Average Complaint Rate per 100k, Timely Response Rate, Disputed Complaint Rate, and Median Response Delay were calculated. Distributions of products, companies, and states were also analyzed. (Script:
src/eda_kpi.py) - Advanced Analysis: Topic modeling (using LDA) was applied to consumer narratives to identify underlying themes, and a simple anomaly detection mechanism was implemented to flag unusual spikes in complaint volumes. (Script:
src/advanced_analysis.py) - Visualization: A suite of interactive and static visualizations was created using Plotly, Matplotlib, and Seaborn to illustrate key findings. (Script:
src/visualizations.py) - Interactive Dashboard: A web-based interactive dashboard was developed using Plotly Dash to provide a dynamic view of the KPIs and insights. (Application:
dashboard/app.py)
- High-Risk Products and Companies: Analysis of complaint volumes and rates reveals specific financial products (e.g., Credit card or prepaid card, Debt collection) and companies (e.g., JPMORGAN CHASE & CO., BANK OF AMERICA, NATIONAL ASSOCIATION) that consistently attract a higher number of consumer grievances. This highlights areas requiring immediate attention for risk mitigation and compliance.
- Geographical Hotspots: States like CA, TX, and NY show higher complaint volumes, which, when normalized by population, still indicate significant complaint activity. This suggests regional variations in consumer protection needs or market practices.
- Response Efficiency Gaps: The median response delay of 1 day, coupled with a timely response rate of 48.74%, indicates that while many responses are quick, a substantial portion of complaints might experience delays. Further investigation into the
submitted_viachannel shows variations in response times, suggesting that optimizing certain channels could improve overall response efficiency.
| KPI | Value |
|---|---|
| Total Complaints | 5000 |
| Avg Complaint Rate per 100k | 3.45 |
| Timely Response Rate | 48.74% |
| Dispute Rate | 50.44% |
| Median Response Delay (Days) | 1.0 |
This line chart illustrates the trend of consumer complaints received over time, highlighting periods of increased or decreased activity.
This choropleth map visualizes the complaint rate per 100,000 residents across different US states, identifying geographical hotspots.
This heatmap shows the frequency of specific issues associated with top financial products, revealing common pain points for consumers.
This stacked bar chart displays how different products' complaints are resolved, categorized by the company's response to the consumer.
This box plot illustrates the distribution of response delays (in days) across various complaint submission channels, indicating channel efficiency.
This treemap provides a hierarchical view of complaint volumes, breaking down complaints by product and then by company within each product category.
The interactive dashboard, built with Plotly Dash, provides a dynamic interface to explore the complaint data. It includes visualizations for complaints over time, state-wise complaint rates, product distribution, response delays by channel, and complaint status distribution. The dashboard is accessible by running python3 dashboard/app.py and navigating to http://127.0.0.1:8050/ in your browser.
A preview of the interactive dashboard, showcasing key metrics and visualizations for an in-depth exploration of consumer complaints.
To set up and run this project locally, follow these steps:
-
Clone the Repository:
git clone <https://github.com/EsamAdelAlselwi/consumer-complaint-intelligence> cd consumer_complaint_intel_new
-
Set up Python Environment: It is recommended to use a virtual environment.
python3 -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install Dependencies:
pip install pandas numpy scikit-learn matplotlib seaborn plotly dash
-
Run Data Pipeline: Execute the scripts in the following order to collect, clean, analyze data, and generate visualizations:
python3 src/data_collection.py python3 src/data_cleaning.py python3 src/eda_kpi.py python3 src/advanced_analysis.py python3 src/visualizations.py
-
Run the Interactive Dashboard: Start the Dash application and access it via your web browser.
python3 dashboard/app.py
Open your web browser and navigate to
http://127.0.0.1:8050/.
For any inquiries or collaborations, feel free to reach out:
- Email: esamalselwi404@gmail.com
- LinkedIn: Esam Al Selwi