This project focuses on detecting and blocking bullying content in social media posts and comments using machine learning and natural language processing (NLP) techniques. Built with a Gradio web interface, it provides real-time monitoring to help prevent the spread of harmful content.
You can try the live application hosted on Hugging Face Spaces:
Cyberbullying on social media can have a significant impact on mental health. This project aims to create a safer online environment by identifying bullying content in real-time and blocking it before it reaches users. The application uses a trained machine learning model to classify content and alert moderators when bullying is detected.
- Real-time Detection: Automatically detects bullying content in posts and comments.
- NLP-based Analysis: Uses natural language processing to analyze the tone and intent of content.
- Simple Interface: Easy-to-use web interface for quick checks.
To run the project locally, follow these steps:
-
Clone the repository:
git clone [https://github.com/Sai2002Praneeth/cyberbullying-app.git](https://github.com/Sai2002Praneeth/cyberbullying-app.git) cd cyberbullying-app -
Create and activate a virtual environment (for Windows):
python -m venv .venv .\.venv\Scripts\Activate
-
Install the dependencies:
pip install -r requirements.txt
-
Run the Gradio application:
python app.py
-
Open the application in your browser at
http://127.0.0.1:7860.
- Open the application in your browser.
- Enter the social media post or comment text in the provided input field.
- Click on Submit. The model will classify the content.
- The result ("Bullying" or "Non-Bullying") will be displayed in the output box.
The model was developed using machine learning and NLP techniques to analyze social media content. Key steps included:
- Data Collection: We compiled a dataset of over 40,000 social media posts and comments with labeled bullying content.
- Preprocessing: Text data was cleaned, tokenized, lemmatized, and stop words were removed.
- Feature Extraction: We extracted relevant features using the TF-IDF (Term Frequency-Inverse Document Frequency) technique.
- Model Training: Multiple classifiers were tested. The final model chosen was the Stochastic Gradient Descent (SGD) Classifier, which achieved an accuracy of 87% on the test set.
The full development process, including data exploration and model comparison, can be found in the cyberbullying-analysis repository.