This basic project aims to identify and classify Instagram accounts as either authentic or fake. By leveraging data analysis and machine learning techniques, we develop a model capable of discerning patterns and characteristics typical of fraudulent accounts. The goal is to enhance the integrity of user interactions on the platform by minimizing the presence of spam and malicious entities.
- Project Overview
- Motivation
- Data Source and Collection
- Features Used
- Methodology
- Installation
- Usage
- Results and Discussion
- Future Work
- Contributing
- License
- Acknowledgments
The proliferation of fake accounts on social media platforms has been a growing concern, affecting user experience by promoting spam and disseminating misinformation. Instagram, being one of the largest social networks, is not immune to this challenge. This project is driven by the need to safeguard the community and ensure a trustworthy social environment for users.
The dataset used in this project was compiled from publicly available sources, consisting of various features indicative of account behavior and characteristics. It includes information on follower counts, post frequency, bio descriptions, and other relevant attributes.
Note: Due to privacy considerations, personal information was anonymized and handled in accordance with data protection regulations.
- Number of followers
- Number of people followed
- Number of posts
- Presence of a profile picture
- Length and content of the bio
- Username complexity (ratio of numbers to letters)
My approach combines exploratory data analysis (EDA) to understand the dataset and feature engineering to prepare the data for modeling. We then apply several machine learning algorithms, including Logistic Regression, Random Forest, and Gradient Boosting, to classify accounts. Model performance is evaluated using accuracy, precision, recall, and F1 score metrics.
To set up the project environment, follow these steps:
git clone https://github.com/methedjango/detecting-fake-Instagram-accounts.git
cd detecting-fake-Instagram-accountsTo run the analysis and model training scripts, open the ipynb file:
jupyter notebookThe findings indicate that certain features, such as the ratio of followers to following and the presence of a bio, are significant indicators of account authenticity. The Random Forest model demonstrated the highest efficacy, achieving an accuracy of XX%.
- Integrate real-time detection capabilities into the Instagram API.
- Explore additional features, such as engagement rates and the sentiment of comments.
- Implement deep learning techniques for more sophisticated pattern recognition.
I welcome contributions to improve the accuracy and efficiency of our fake account detection model. Please refer to CONTRIBUTING.md for guidelines on submitting pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
- Special thanks to the Instagram community for supporting this project.