Virginia Herrero herrerovir

👋 Hello there!

Data Scientist | Chemical Engineer | NLP Enthusiast
I build end-to-end solutions: from data wrangling to model deployment.

👩‍🔬 About Me

Hi, I'm Virginia, a data scientist with a background in chemical engineering. I specialize in transforming raw data into actionable insights, advanced models, and engaging dashboards.

💚 Moto: Clean data. Smart models. Real impact.
🐍 Python Enthusiast: Passionate about solving real-world problems through data.
🤖 Machine Learning & Deep Learning: Always leveling up in ML/DL, experimenting with new algorithms, and creating innovative solutions.
🎨 Data Visualization: Creating clear, compelling, and interactive data visualizations to tell the story behind the numbers.
📚 Clear & Reusable Code: I document everything to ensure science and code are understandable, reproducible, and scalable.

🧰 Tech Stack

Languages & Frameworks Python · Scikit-learn · Keras · TensorFlow · PyTorch · NumPy · Pandas
Data Visualization Matplotlib · Seaborn · Plotly · Power BI
Databases & Querying SQL · MySQL · PostgreSQL
Modeling & Deployment Scikit-learn · TensorFlow · Keras · PyTorch · Hugging Face Transformers ·FastAPI · Streamlit · Gradio
Tools & Environment Jupyter Notebooks · VS Code · Git · GitHub · TensorBoard · Docker · Google Cloud

🚀 Portfolio Highlights

Here are five of my strongest, production-focused data science projects. These combine deep learning, classical ML, NLP, and deployment to solve real-world problems.

🔹 Product Category Classification with DistilBERT

Problem: Manual product categorization for e-commerce is time-consuming and prone to errors.
Solution: Fine-tuned DistilBERT for product categorization with end-to-end data preprocessing, model training, evaluation, and deployment using streamlit as local frontend while FastAPI handles backend inference. For cloud deployment, the model is integrated into Hugging Face Spaces for a seamless, interactive experience.
Impact: Achieved 96.5% accuracy, automating product tagging and improving searchability and customer experience.
Tools: DistilBERT, Streamlit, FastAPI, Hugging Face, PyTorch

👉 Try the Live Demo

🔹 GPT-2 for Tweet Sentiment Analysis

Problem: Traditional sentiment analysis struggles to capture the nuances of modern social media discourse, like slang, irony, and sarcasm.
Solution: Fine-tuned GPT-2 for tweet sentiment classification, with deployment using Gradio via Hugging Face Spaces for real-time, interactive predictions.
Impact: Delivers a more accurate, context-aware sentiment analysis solution, enhancing social media monitoring, brand management, and political sentiment analysis.

Tools: GPT-2, Transformers, PyTorch, Gradio, Hugging Face

👉 Try the Live Demo

🔹 Industrial Steel Defect Detection with XGBoost

Problem: Manual inspection in manufacturing is slow, costly, and prone to human error, especially when detecting rare defects
Solution: Built an XGBoost model to automate defect detection, utilizing SMOTE to address class imbalance and SHAP for model explainability. Deployed as a Streamlit app for real-time interaction and defect classification.
Impact: Enhances quality control by improving accuracy, reducing downtime, and minimizing both false positives and false negatives. Transparent decision-making offers confidence in automation.

Tools: XGBoost, SMOTE, SHAP, Streamlit, Scikit-learn

👉 Try Live Demo

🔹 Air Quality Forecasting with LSTM

Problem: Predicting air quality is challenging because of complex patterns and limited external data.
Solution: Developed an LSTM model that forecasts NO₂ levels using only historical data after thorough preprocessing.
Impact: Provides reliable short-term air quality forecasts to support pollution control and public health decisions.

Tools: LSTM, TensorFlow, Keras, Scikit-learn, Statsmodels

🔹 Predictive Maintenance for Industrial Equipment

Problem: Unscheduled breakdowns of machinery cause costly downtime and emergency repairs in manufacturing.
Solution: Built a Random Forest classifier using sensor data to predict equipment failures, handling class imbalance, and applying SHAP explainability for model interpretability.
Impact: Significantly reduces downtime and maintenance costs by allowing for predictive, preventative maintenance actions with clear insights into key failure drivers.

Tools: Random Forest, Scikit-learn, Imbalanced-learn, Pandas, Matplotlib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly