Skip to content
View herrerovir's full-sized avatar
  • Valladolid, Spain

Block or report herrerovir

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
herrerovir/README.md

👋 Hello there!

Data Scientist | Chemical Engineer | NLP Enthusiast
I build end-to-end solutions: from data wrangling to model deployment.

👩‍🔬 About Me

Hi, I'm Virginia, a data scientist with a background in chemical engineering. I specialize in transforming raw data into actionable insights, advanced models, and engaging dashboards.

  • 💚 Moto: Clean data. Smart models. Real impact.
  • 🐍 Python Enthusiast: Passionate about solving real-world problems through data.
  • 🤖 Machine Learning & Deep Learning: Always leveling up in ML/DL, experimenting with new algorithms, and creating innovative solutions.
  • 🎨 Data Visualization: Creating clear, compelling, and interactive data visualizations to tell the story behind the numbers.
  • 📚 Clear & Reusable Code: I document everything to ensure science and code are understandable, reproducible, and scalable.

🧰 Tech Stack

  • Languages & Frameworks Python · Scikit-learn · Keras · TensorFlow · PyTorch · NumPy · Pandas

  • Data Visualization Matplotlib · Seaborn · Plotly · Power BI

  • Databases & Querying SQL · MySQL · PostgreSQL

  • Modeling & Deployment Scikit-learn · TensorFlow · Keras · PyTorch · Hugging Face Transformers ·FastAPI · Streamlit · Gradio

  • Tools & Environment Jupyter Notebooks · VS Code · Git · GitHub · TensorBoard · Docker · Google Cloud

🚀 Portfolio Highlights

Here are five of my strongest, production-focused data science projects. These combine deep learning, classical ML, NLP, and deployment to solve real-world problems.

🔹 Product Category Classification with DistilBERT

GitHub Repo Hugging Face

  • Problem: Manual product categorization for e-commerce is time-consuming and prone to errors.
  • Solution: Fine-tuned DistilBERT for product categorization with end-to-end data preprocessing, model training, evaluation, and deployment using streamlit as local frontend while FastAPI handles backend inference. For cloud deployment, the model is integrated into Hugging Face Spaces for a seamless, interactive experience.
  • Impact: Achieved 96.5% accuracy, automating product tagging and improving searchability and customer experience.
  • Tools: DistilBERT, Streamlit, FastAPI, Hugging Face, PyTorch

👉 Try the Live Demo

🔹 GPT-2 for Tweet Sentiment Analysis

GitHub Repo Hugging Face

  • Problem: Traditional sentiment analysis struggles to capture the nuances of modern social media discourse, like slang, irony, and sarcasm.
  • Solution: Fine-tuned GPT-2 for tweet sentiment classification, with deployment using Gradio via Hugging Face Spaces for real-time, interactive predictions.
  • Impact: Delivers a more accurate, context-aware sentiment analysis solution, enhancing social media monitoring, brand management, and political sentiment analysis.

Tools: GPT-2, Transformers, PyTorch, Gradio, Hugging Face

👉 Try the Live Demo

🔹 Industrial Steel Defect Detection with XGBoost

GitHub Repo Streamlit Cloud

  • Problem: Manual inspection in manufacturing is slow, costly, and prone to human error, especially when detecting rare defects
  • Solution: Built an XGBoost model to automate defect detection, utilizing SMOTE to address class imbalance and SHAP for model explainability. Deployed as a Streamlit app for real-time interaction and defect classification.
  • Impact: Enhances quality control by improving accuracy, reducing downtime, and minimizing both false positives and false negatives. Transparent decision-making offers confidence in automation.

Tools: XGBoost, SMOTE, SHAP, Streamlit, Scikit-learn

👉 Try Live Demo

🔹 Air Quality Forecasting with LSTM

GitHub Repo

  • Problem: Predicting air quality is challenging because of complex patterns and limited external data.
  • Solution: Developed an LSTM model that forecasts NO₂ levels using only historical data after thorough preprocessing.
  • Impact: Provides reliable short-term air quality forecasts to support pollution control and public health decisions.

Tools: LSTM, TensorFlow, Keras, Scikit-learn, Statsmodels

🔹 Predictive Maintenance for Industrial Equipment

GitHub Repo

  • Problem: Unscheduled breakdowns of machinery cause costly downtime and emergency repairs in manufacturing.
  • Solution: Built a Random Forest classifier using sensor data to predict equipment failures, handling class imbalance, and applying SHAP explainability for model interpretability.
  • Impact: Significantly reduces downtime and maintenance costs by allowing for predictive, preventative maintenance actions with clear insights into key failure drivers.

Tools: Random Forest, Scikit-learn, Imbalanced-learn, Pandas, Matplotlib

Pinned Loading

  1. Product-category-classifier Product-category-classifier Public

    Fine-tuned DistilBERT model for classifying e-commerce product descriptions into categories.

    Jupyter Notebook

  2. gpt2-tweet-sentiment gpt2-tweet-sentiment Public

    Fine-tuned GPT-2 model to classify tweets by sentiment. Reliable and fast sentiment analysis for social media.

    Jupyter Notebook

  3. Steel-fault-classifier Steel-fault-classifier Public

    A machine learning classification project aimed to predict faults on industrial steel plates.

    Jupyter Notebook

  4. Air-quality-forecasting Air-quality-forecasting Public

    A machine learning regression project with an LSTM model to forecast NO₂ concentrations and predict air quality trends.

    Jupyter Notebook

  5. Predictive-maintenance-industrial-machinery Predictive-maintenance-industrial-machinery Public

    A machine learning project aimed at predicting failures in an industrial milling machine using a random forest model.

    Jupyter Notebook

  6. Fitzgerald-sentiment-topic-analysis Fitzgerald-sentiment-topic-analysis Public

    A NLP project exploring sentiment and themes in F. Scott Fitzgerald’s novels to reveal emotional and thematic patterns.

    Jupyter Notebook