-
In this project, I aimed to transform and analyze data from an ecommerce database to extract valuable insights regarding
-
User behavior
-
Product performance
-
Revenue generation
across different regions.
- Clean and validate the data to ensure its quality and consistency.
- Perform SQL-based transformations, including __aggregating data, calculating metrics like total revenue,and determining top-selling products.
- Use advanced SQL techniques like window functions and joins to answer specific business questions, such as identifying patterns in customer interactions and predicting product performance across cities and countries.
- Loaded the dataset from various sources (CSV, tables) into PostgreSQL using the psql tool and pgAdmin.
- Cleaned the data by removing inconsistencies, such as invalid characters in IDs and non-alphanumeric values in columns.
- Created SQL queries to aggregate data, calculate key metrics (e.g., total revenue, average order value), and analyze top-selling products and customer behavior.
- Used window functions to examine data across partitions, such as city or country, to compare product performance.
- Built views to simplify repetitive queries, particularly for top-selling products by city and country.
- Analyzed visitor interactions, product sentiment, and regional differences to answer specific questions like:
- Which products are most popular across multiple regions?
- What is the average revenue per order by region?
- What are the correlations between pageviews, time on site, and revenue?
- Performed QA checks to ensure correct data transformation and query output, including validation of join results and proper aggregation.
- By analyzing the ecommerce data, I discovered several key insights:
- Top-Selling Products: Some products, like the 'Cam Indoor Security Camera - USA' and 'Men's 100% Cotton Short Sleeve Hero Tee White,' were global top-sellers, consistently generating high revenue in multiple regions.
- Revenue Patterns: Revenue generation patterns varied significantly across different cities and countries, with certain products performing better in specific regions.
- Sentiment Analysis: Sentiment scores revealed valuable insights into customer satisfaction, which was linked to sales performance in certain product categories. -Customer Behavior: I was able to determine the page paths visitors used frequently
-
Data Inconsistencies:
- I encountered issues with non-alphanumeric characters in product IDs and other fields, which required cleaning and validation before meaningful analysis could take place.
- Complex Joins and Data Transformation: Combining multiple tables with varying granularities, especially when dealing with user sessions and product sales data, proved to be complex at times. Ensuring that all metrics were calculated correctly across different regions and products required careful attention to SQL queries.
- Large Datasets: Handling large volumes of data and performing aggregation and transformation operations took considerable time and effort. Optimizing queries for performance was essential to ensure fast results.
-
Refining Data Cleaning and Validation Processes: Implementing more advanced data cleaning methods to ensure consistency and prevent errors. This would include filtering out invalid characters and handling missing or duplicate data more efficiently.
-
Improving Query Optimization: I would work on optimizing complex queries to reduce processing time, particularly for large datasets. This could involve indexing and adjusting join strategies for more efficient data retrieval.
-
Building More Detailed Reports: Expanding the analysis to cover a broader range of business questions, like trends in specific product categories over time, or detailed customer demographics analysis.
-
Setting Up a Dashboard: Creating a more interactive reporting system using SQL queries connected to visualization tools to allow stakeholders to track key metrics in real-time.