Skip to content

A Flask REST API for extracting and analyzing programming language statistics from GitHub repositories. Features an ETL pipeline, data persistence with MongoDB, and easy deployment via Docker Compose. Trigger ETL, view, and manage language stats through simple endpoints.

Notifications You must be signed in to change notification settings

adibhosn/LanguagesGitHub-etl-MongoDB-flask

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Analysis

A Flask REST API for extracting, processing, and analyzing programming language statistics from public GitHub repositories. Data is stored in MongoDB and all services run easily with Docker Compose.


Features

  • Run ETL (Extract, Transform, Load) for any GitHub user/organization by sending a POST request with their username and your GitHub token.
  • Store and query language statistics for any owner in MongoDB.
  • Delete language statistics for a specific owner.
  • All endpoints documented and easy to use.
  • No need to install MongoDB locally—everything runs in Docker.

Quickstart

1. Clone the repository

https://github.com/yourusername/LanguagesGitHub-etl-MongoDB-flask.git
cd github-analysis

2. Configure environment variables

Create a .env file in the project root (or use the provided .env.example):

MONGO_URI=mongodb://mongodb:27017/

Note: You do not need to add your GitHub token to .env. The token is sent in each POST request.

3. Build and run with Docker Compose

docker-compose -f infra/docker/docker-compose.yml up --build

This will start both the Flask API and MongoDB.


API Documentation

See docs/API.md for full endpoint documentation.

Main Endpoints

Health Check

GET /

Get Language Statistics

GET /api/languages

Run ETL for a GitHub Owner

POST /api/etl

Request Body:

{
  "owner": "OWNER_NAME",
  "token": "YOUR_GITHUB_TOKEN"
}

Delete Data for an Owner

DELETE /api/etl/<owner>

Example Usage

Run ETL for Facebook:

curl -X POST http://localhost:5000/api/etl \
  -H "Content-Type: application/json" \
  -d '{"owner": "facebook", "token": "YOUR_GITHUB_TOKEN"}'

Get all language statistics:

curl http://localhost:5000/api/languages

Delete statistics for an owner:

curl -X DELETE http://localhost:5000/api/etl/facebook

Architecture

  • Flask API: Handles HTTP requests and orchestrates ETL.
  • ETL Service: Fetches and processes data from GitHub.
  • MongoDB: Stores repositories and language statistics.
  • Docker Compose: Orchestrates containers for API and MongoDB.

See docs/ARCHITECTURE.md for more details.


Deployment

For local use, Docker Compose is all you need.

For production (e.g., AWS ECS, EC2, or Elastic Beanstalk):

  • Build and push your Docker images to a registry (like Amazon ECR).
  • Deploy using your preferred AWS service.
  • Use MongoDB Atlas or a managed MongoDB for production.

Never commit secrets or AWS credentials to your repository.


License

MIT License


Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.


About

A Flask REST API for extracting and analyzing programming language statistics from GitHub repositories. Features an ETL pipeline, data persistence with MongoDB, and easy deployment via Docker Compose. Trigger ETL, view, and manage language stats through simple endpoints.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published