This repository contains the code and documentation for a 360-Degree Feedback software designed to analyze news stories related to the Government of India in regional media using advanced Artificial Intelligence (AI) and Machine Learning (ML) techniques. The project leverages state-of-the-art IndicBERT models, Openai/whisper and other cutting-edge Natural Language Processing (NLP) techniques to provide sentiment analysis and categorization of news stories and generate Transcriptions. Additionally, it offers an intuitive dashboard for efficient content sorting, filtering, and inspection.
-
Presentation: You can also view our project presentation on Google Slides: 360-Degree Feedback Software Presentation.
-
Website: Visit our project website at https://news-sentinel.vercel.app to access the live application.
-
Video: Demonstration Video
-
PDF Documentation: You can download our PDF documentation from here.
-
Front-end Repository: The source code for our front-end website is available on GitHub: Front-end Repository.
-
Machine Learning Notebooks: Transcription | Classification
-
IndicBERT Models: We utilize state-of-the-art IndicBERT models, incorporating research from IIT Madras, to analyze regional news in major Indian languages with State of the Art performance
-
NLP Techniques: Our system employs advanced NLP techniques, including transformers, Speech-to-Text, and Optical Character Recognition (OCR) capabilities, to extract and analyze news content.
-
Python Tools: The project integrates Python tools for web scraping, real-time notifications, and data preprocessing.
This section lists the major frameworks and libraries used to bootstrap our project:
-
Next.js: Used for building the user-friendly frontend of the dashboard.
-
React: Complements Next.js for building interactive user interfaces.
-
Flask: Powers the backend of our application, handling data processing and serving API endpoints.
-
Node.js: Used for server-side scripting and for managing dependencies.
-
Ai4Bharat: Leveraged for specialized AI and NLP capabilities tailored to Indian languages and regional media analysis.
-
Data Sources: Our system gathers data from diverse sources, including:
- Web scraping of regional media websites.
- E-paper text extraction via OCR.
- YouTube video analysis for news content.
-
Sentiment Analysis: We provide sentiment analysis for each news story, categorizing them as positive, neutral, or negative.
-
Categorization by Department: News stories are categorized by the department they are related to within the Government of India.
-
User-Friendly Interface: The project features an intuitive dashboard built with Next.js for the frontend and Flask for the backend.
-
Efficient Content Management: Users can easily sort, filter, and inspect news stories using the dashboard, ensuring quick access to relevant information.
-
AWS Hosting: The dashboard is hosted on Amazon Web Services (AWS) for scalability and reliability.
-
IndicBERT Fine-Tuning: We fine-tune a pretrained IndicBERT model to achieve high-quality predictions on regional news content.
-
Prediction Pipeline: A streamlined pipeline is set up to preprocess new data and pass it through the model for accurate sentiment analysis and Categorization.