Data Scientist | Chemical Engineer | NLP Enthusiast
I build end-to-end solutions: from data wrangling to model deployment.
Hi, I'm Virginia, a data scientist with a background in chemical engineering. I specialize in transforming raw data into actionable insights, advanced models, and engaging dashboards.
- 💚 Moto: Clean data. Smart models. Real impact.
- 🐍 Python Enthusiast: Passionate about solving real-world problems through data.
- 🤖 Machine Learning & Deep Learning: Always leveling up in ML/DL, experimenting with new algorithms, and creating innovative solutions.
- 🎨 Data Visualization: Creating clear, compelling, and interactive data visualizations to tell the story behind the numbers.
- 📚 Clear & Reusable Code: I document everything to ensure science and code are understandable, reproducible, and scalable.
-
Languages & Frameworks
Python·Scikit-learn·Keras·TensorFlow·PyTorch·NumPy·Pandas -
Data Visualization
Matplotlib·Seaborn·Plotly·Power BI -
Databases & Querying
SQL·MySQL·PostgreSQL -
Modeling & Deployment
Scikit-learn·TensorFlow·Keras·PyTorch·Hugging Face Transformers·FastAPI·Streamlit·Gradio -
Tools & Environment
Jupyter Notebooks·VS Code·Git·GitHub·TensorBoard·Docker·Google Cloud
Here are five of my strongest, production-focused data science projects. These combine deep learning, classical ML, NLP, and deployment to solve real-world problems.
- Problem: Manual product categorization for e-commerce is time-consuming and prone to errors.
- Solution: Fine-tuned DistilBERT for product categorization with end-to-end data preprocessing, model training, evaluation, and deployment using streamlit as local frontend while FastAPI handles backend inference. For cloud deployment, the model is integrated into Hugging Face Spaces for a seamless, interactive experience.
- Impact: Achieved 96.5% accuracy, automating product tagging and improving searchability and customer experience.
- Tools:
DistilBERT,Streamlit,FastAPI,Hugging Face,PyTorch
- Problem: Traditional sentiment analysis struggles to capture the nuances of modern social media discourse, like slang, irony, and sarcasm.
- Solution: Fine-tuned GPT-2 for tweet sentiment classification, with deployment using Gradio via Hugging Face Spaces for real-time, interactive predictions.
- Impact: Delivers a more accurate, context-aware sentiment analysis solution, enhancing social media monitoring, brand management, and political sentiment analysis.
Tools: GPT-2, Transformers, PyTorch, Gradio, Hugging Face
- Problem: Manual inspection in manufacturing is slow, costly, and prone to human error, especially when detecting rare defects
- Solution: Built an XGBoost model to automate defect detection, utilizing SMOTE to address class imbalance and SHAP for model explainability. Deployed as a Streamlit app for real-time interaction and defect classification.
- Impact: Enhances quality control by improving accuracy, reducing downtime, and minimizing both false positives and false negatives. Transparent decision-making offers confidence in automation.
Tools: XGBoost, SMOTE, SHAP, Streamlit, Scikit-learn
- Problem: Predicting air quality is challenging because of complex patterns and limited external data.
- Solution: Developed an LSTM model that forecasts NO₂ levels using only historical data after thorough preprocessing.
- Impact: Provides reliable short-term air quality forecasts to support pollution control and public health decisions.
Tools: LSTM, TensorFlow, Keras, Scikit-learn, Statsmodels
- Problem: Unscheduled breakdowns of machinery cause costly downtime and emergency repairs in manufacturing.
- Solution: Built a Random Forest classifier using sensor data to predict equipment failures, handling class imbalance, and applying SHAP explainability for model interpretability.
- Impact: Significantly reduces downtime and maintenance costs by allowing for predictive, preventative maintenance actions with clear insights into key failure drivers.
Tools: Random Forest, Scikit-learn, Imbalanced-learn, Pandas, Matplotlib