Hi! I’m an aspiring Data Scientist with strong skills in Python, SQL, and end-to-end project development. I specialize in turning raw data into insights through data cleaning, exploratory analysis, predictive modeling, and visualization. I enjoy solving meaningful problems, building analytical tools, and applying data-driven thinking to real-world challenges.
Actively seeking opportunities as a Data Scientist, Machine Learning Engineer, or Data Analyst.
- Python
- SQL
- Pandas • NumPy • Scikit-learn • SciPy
- Data Cleaning & Wrangling
- Regression, Classification, Clustering
- Time Series Analysis
- Feature Engineering
- Model Evaluation & Cross-Validation
- Matplotlib
- Seaborn
- Plotly
- SQLite
- Jupyter Notebook • Google Colab
- Git & GitHub
- VS Code
- Flask
- RFM Analysis
- Customer Segmentation
- Window Functions
- CTEs & SQL Analytics
- EDA & Feature Analysis
- Dashboarding / Reporting
Tech: SQL, SQLite, SQLite CLI, VS Code
Concepts: Customer segmentation, RFM analysis, window functions, conditional logic
This project demonstrates customer analytics using SQL. It performs customer segmentation based on purchasing behavior, implements RFM scoring, ranks high-value customers, and analyzes spending trends using moving averages.
Advanced SQL techniques used include CTEs, window functions, and conditional statements.
Insights from this project can be used for targeted marketing, loyalty programs, and personalized customer experiences.
🔗 *GitHub Repo: www.github.com/bonnilee/sql-customer-segmentation
Tech: Python, Flask, pdfminer.six, NLTK, Matplotlib, Pandas, HTML/CSS (Jinja)
Concepts: Text extraction, document parsing, NLP preprocessing, visualization
This web application allows users to upload a PDF and search for word occurrences by section. It automatically detects section headers using font differences, cleans extracted text, and generates charts showing word frequency per section.
Features:
- Upload and parse PDFs
- Detect and separate sections based on headers
- Clean text with NLTK stopwords
- Search for a specific keyword
- Visualize occurrences in a chart
🔗 GitHub Repo: www.github.com/bonnilee/WordCounter
Tech: Python, Pandas, Matplotlib, Seaborn
Dataset: Most Streamed Spotify Songs 2023
A full exploratory data analysis (EDA) project exploring trends in popular songs across streaming platforms.
Highlights:
- Cleaned and transformed a complex dataset (streams, release dates, audio features)
- Analyzed top artists by song count and total streams
- Explored audio feature distributions (danceability, energy, valence, speechiness, BPM)
- Conducted correlation and heatmap analysis
- Built monthly and yearly release trend visualizations
- Examined platform presence vs. stream count
- Discovered patterns showing that being on 5–6 platforms leads to higher streams
Key Insights:
- Top artists include The Weeknd, Taylor Swift, Ed Sheeran, Harry Styles, and more.
- Most songs released between 2019–2023, with peaks in 2021 and 2022.
- High danceability and moderate energy dominate popular songs.
- More platform presence generally correlates with higher stream counts.
🔗 *GitHub Repo: www.github.com/bonnilee/spotify
- LinkedIn: www.linkedin.com/in/bonni-lynch-b08a98305
- GitHub: www.github.com/bonnilee
- Email: [email protected]
- Data Scientist
- Machine Learning Engineer
- Data Analyst
Open to remote or U.S.-based roles (full-time or contract).
Thanks for visiting my profile! 🚀
Feel free to explore my repositories or reach out!