This project aims to track and daily archive historical GitHub trending repositories & developers by most popular programming and markup languages. As GitHub's trending list is constantly changing and GitHub does not provide API to get this information retrospectively, this repository helps in maintaining a historical archive using GitHub Actions.
UI for this repository could be found in Ÿ Trend part of free Ÿ HŸPE service.
There are many GitHub Trending archives already, but I've decided to make my own. All of them doesn't satisfy one or many of my requirements:
- Store data in JSON.
- Guarantee of data scraping.
- Archive must be as small as possible.
- Archive must include Repositories & Developers.
- Archive must include
All languagestrends. - Archive should include all popular languages.
Detailed motivation could be found in the FAQ.
The main implementation of this project involves the following steps:
- GitHub Actions: We utilize GitHub Actions to automate the process of updating the archive on a regular basis. You can find the workflow configuration in the
.github/workflowsdirectory. - Scraping GitHub Trending: We use web scraping techniques to request and parse GitHub's trending HTML pages for selected languages.
- Data Storage: Extracted data is stored in a structured JSON format in the
archivedirectory.
Programming languages
- C
- C#
- C++
- Dart
- Elixir
- Erlang
- Go
- Haskell
- Java
- JavaScript
- Kotlin
- Lua
- Perl
- PHP
- Python
- R
- Ruby
- Rust
- Scala
- Shell
- Swift
- TypeScript
Markup languages
- CSS
- HTML
- Markdown
Frontend frameworks
- Svelte
- Vue
Other
- HCL (HashiCorp Configuration Language)
- Makefile
- Lua
- WebAssembly
I think that having daily trends we may compute weekly/monthly trends ourselves.
I've tried to make this archive simple and as small as possible. All related information may be fetched using GitHub API.
I haven't found description of the GitHub Trends logic. But, after doing some researches I've made an assumption that daily trends displayed not for today or yesterday, but in a 24 hours window. For example, when you are opening trending page in 13:00, you will see trends from 13:00 yesterday to 13:00 today.
Other projects with such functionality updating trends every hour, but at the end of the day they all will have trends from 23:00 yesterday to 23:00 today.
Running workflows hourly protects us from trends page outage. If we can't fetch the data, we will try to get it one more time 1 hour later.
I was inspired by other project implemented on TypeScript. Just wanted to reduce time on development.
GitHub Trending Archiveproject is open-sourced software licensed under the MIT license by Anton Komarev.
CyberCog is a Social Unity of enthusiasts. Research the best solutions in product & software development is our passion.
