This project performs a comprehensive analysis of the Data Analyst job market by examining over 8,400 job postings from LinkedIn across the USA, Canada, and Africa. The goal is to uncover key trends, identify in-demand skills, and understand the characteristics of available positions.
The entire data-to-insight pipeline is covered, starting from raw data collection, moving through cleaning and analysis in a Python Jupyter Notebook, and culminating in a fully interactive and dynamic dashboard built in Power BI. This project is designed to showcase a strong understanding of data analysis, feature engineering, and data storytelling.
Here is a preview of the interactive dashboard created to visualize the key findings of this analysis. The dashboard allows for dynamic filtering by region, seniority level, and work model to provide granular insights.
This analysis seeks to answer the following critical questions for aspiring data professionals:
- Which regions (USA, Canada, Africa) have the most job opportunities for Data Analysts?
- What is the distribution of work models (On-site, Hybrid, Remote) across these regions?
- What is the typical salary range for Data Analysts, particularly in the USA?
- Which industries are the top recruiters for Data Analyst roles?
- What are the most frequently mentioned and in-demand technical skills in job descriptions?
The dataset used for this analysis was sourced from Kaggle and consists of three separate CSV files for different regions.
- Three distinct datasets (
linkedin-jobs-usa.csv,linkedin-jobs-canada.csv,linkedin-jobs-africa.csv) were loaded into a Pandas DataFrame. - A
countrycolumn was added to each dataset before merging them into a single, unified DataFrame of 8,490 job listings.
This was a critical step to transform raw, messy data into a structured format suitable for analysis.
- Criteria Parsing: The
criteriacolumn, which was a string representation of a list of dictionaries, was parsed using Python'sastlibrary. This extracted key features likeseniority_level,employment_type,job_function, andindustriesinto their own columns. - Salary Cleaning: The
salarycolumn contained text and ranges (e.g., "$80,000 - $100,000"). A function was created to extract numerical values, handle ranges by taking the average, and create a newaverage_salarycolumn. - Skill Extraction: The
descriptioncolumn was mined for mentions of key technical skills (e.g., SQL, Python, R, Excel, Tableau, Power BI, AWS). Boolean columns (mentions_sql, etc.) were created to flag the presence of each skill in a posting. - Standardization: The
onsite_remotecolumn was standardized into awork_modelcolumn with capitalized, consistent values.
Using the cleaned dataset, an in-depth EDA was conducted in the Jupyter Notebook (linkedin_data_analyst_job_listings_analysis.ipynb) to uncover initial trends and patterns. Visualizations were created using Matplotlib and Seaborn to illustrate the findings.
The final, cleaned dataset (cleaned_global_linkedin_jobs.csv) was exported and used as the source for an interactive dashboard in Power BI. The dashboard was designed with a professional blue and grey theme and includes:
- KPI Cards: Highlighting total jobs, total companies, and average salary.
- Interactive Slicers: For filtering the entire dashboard by Country, Seniority Level, and Work Model.
- Visualizations: Including a map for geographic distribution, bar charts for top industries and skills, and a donut chart for seniority breakdown.
- Data Analysis: Python, Pandas, Jupyter Notebook
- Data Visualization: Matplotlib, Seaborn, Power BI
- Libraries Used:
re(for regular expressions),ast(for parsing string literals)
The analysis revealed that the USA has a significantly higher volume of Data Analyst job postings compared to Canada and Africa in this dataset, indicating a larger and more mature market.
Hybrid work is the most common model offered across all three regions, followed closely by fully On-site roles. Fully Remote positions are the least common, suggesting that while flexibility is increasing, a physical presence is still often preferred.
Salary data was most robust for the USA. As expected, there is a clear positive correlation between seniority_level and average_salary. Director and Executive roles command significantly higher compensation, while Entry-level and Associate roles form the lower end of the spectrum.
The IT Services & Consulting and Technology, Information & Media sectors are the dominant industries hiring Data Analysts. This is followed by Financial Services, highlighting the data-centric nature of these fields.
SQL and Excel remain the most fundamental and frequently requested skills. Following closely are Python, Tableau, and Power BI, underscoring the need for a blend of database, programming, and visualization capabilities.
- Clone the repository:
git clone [https://github.com/your-username/your-repository-name.git](https://github.com/your-username/your-repository-name.git)
- Install the required libraries:
pip install pandas matplotlib seaborn jupyter
- Run the Jupyter Notebook:
Open
linkedin_data_analyst_job_listings_analysis.ipynbin Jupyter Notebook to view the full data cleaning and analysis process. - View the Dashboard:
The
Linkedin Data Analyst Job Listings Dashboard.jpgfile provides a static view of the final dashboard.
This project provides a detailed snapshot of the global Data Analyst job market, offering valuable insights for job seekers. It demonstrates an end-to-end analytical workflow, from handling complex, unstructured data to presenting findings in a clear and compelling interactive dashboard. The key takeaway is that a strong foundation in SQL and Excel, complemented by proficiency in Python and a major BI tool like Power BI or Tableau, is crucial for success in this field.
