Skip to content

Commit fc335d5

Browse files
committed
doc : add en readme
1 parent 1aed727 commit fc335d5

File tree

2 files changed

+296
-10
lines changed

2 files changed

+296
-10
lines changed

README.en.md

Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
# GitHub Stars Index
2+
3+
English | [中文](README.md)
4+
5+
> Automatically fetch GitHub Stars, generate AI summaries, and make them easily searchable.
6+
7+
## Contents
8+
9+
- [Features](#features)
10+
- [Quick Start](#quick-start)
11+
- [Configuration Reference (Environment Variables / .env)](#configuration-reference-environment-variables--env)
12+
- [Obsidian Sync (Optional)](#obsidian-sync-optional)
13+
- [Local Installation](#local-installation)
14+
15+
---
16+
17+
## Features
18+
19+
- 🤖 **Automatic Sync**: Fetches all starred repositories from your GitHub account.
20+
- 📝 **AI Summaries**: Reads each repository's README and uses AI to generate concise summaries and technical tags.
21+
- 🏷️ **Smart Tagging**: Built-in `TAG_MAPPING` for automatic synonym merging and tech stack normalization (e.g., LLM -> Large Language Model), preventing tag explosion.
22+
- ⚡️ **High Performance**: Supports **concurrency** for AI API calls, significantly speeding up the processing of new projects.
23+
- 🗃️ **Data Driven**: Uses `data/stars.json` at runtime and publishes it to `gh-pages/data/stars.json` for custom development.
24+
- 🎨 **Template Driven**: Uses Jinja2 templates to generate Markdown and static HTML search pages.
25+
- ⏭️ **Smart Incremental Updates**: Uses AI for new projects, while **automatically updating star counts and metadata** for existing ones.
26+
-**Automated Workflow**: Regularly runs via GitHub Actions with customizable cron schedules.
27+
- 🔄 **Vault Sync (Optional)**: Automatically pushes generated `stars_zh.md` & `stars_en.md` to your **Obsidian Vault**.
28+
- 🌐 **GitHub Pages (Optional)**: Deploys a static search page with multi-language (ZH/EN) support and real-time search.
29+
- 💻 **Flexible AI Providers**: Compatible with any **OpenAI-format API** (OpenAI, Azure, local Ollama, etc.).
30+
31+
---
32+
33+
## Process Overview
34+
35+
```mermaid
36+
graph TD
37+
Start([Start]) --> Trigger{Trigger Mode}
38+
Trigger -- "Actions (Schedule/Manual)" --> Sync[Run sync_stars.py]
39+
Trigger -- "Local (Manual Run)" --> Sync
40+
41+
Sync --> FetchGH[Fetch GitHub Stars]
42+
FetchGH --> Filter{Incremental Check}
43+
Filter -- "Processed Projects" --> UpdateMeta[Update Stars/Metadata]
44+
Filter -- "New Projects" --> FetchRD[Fetch README]
45+
46+
FetchRD --> AI[AI Summarization/Tagging]
47+
AI --> Norm[Tag Governance/Normalization]
48+
Norm --> Store[(data/stars.json)]
49+
UpdateMeta --> Store
50+
Store --> Render
51+
52+
Render[[Jinja2 Template Rendering]] --> Output
53+
54+
subgraph Output [Output Results]
55+
MD[Markdown Archive]
56+
HTML[Static HTML Search Page]
57+
end
58+
59+
Output --> Dispatch{Distribution}
60+
Dispatch -- "VAULT_SYNC" --> Obs[Push to Obsidian Vault]
61+
Dispatch -- "PAGES_SYNC" --> Pages[Deploy GitHub Pages]
62+
63+
Obs --> End([Finish])
64+
Pages --> End
65+
```
66+
67+
---
68+
69+
## Quick Start
70+
71+
### Step 1: Fork This Repository
72+
73+
Click the **Fork** button in the top right corner to copy this repository to your account.
74+
75+
### Step 2: Configure Environment (Choose One)
76+
77+
This project is driven by environment variables. **Priority: GitHub Secrets > .env file**.
78+
79+
#### Method A: Using GitHub Environment Variables (Recommended for continuous running)
80+
81+
Go to **Settings → Secrets and variables → Actions** in your repository:
82+
83+
**🔐 Required Secrets/Variables**
84+
- `GH_USERNAME`: The GitHub username whose stars you want to crawl.
85+
- `AI_API_KEY`: Your AI interface API Key.
86+
87+
**📋 Optional Variables**
88+
These have built-in defaults and usually don't need configuration:
89+
- `AI_BASE_URL`: AI API endpoint (defaults to OpenAI).
90+
- `AI_MODEL`: Model name (defaults to `gpt-4o-mini`).
91+
- `OUTPUT_FILENAME`: Base name for generated files (defaults to `stars`).
92+
- `VAULT_SYNC_PATH`: Save directory in your Vault (defaults to `GitHub-Stars/`).
93+
- `PAGES_SYNC_ENABLED`: Whether to sync to Pages (defaults to `true`).
94+
95+
> [!TIP]
96+
> **About GitHub API Limits**:
97+
> - **Running Online (Actions)**: The workflow automatically injects `GITHUB_TOKEN` with a high limit (1,000 requests/hour), easily handling heavy crawls.
98+
> - **Running Locally**: Without a `GH_TOKEN`, the limit is 60 requests/hour. If you have many stars, it's recommended to add a `GH_TOKEN` to your `.env` to increase the limit to 5,000 requests/hour.
99+
100+
#### Method B: Using a .env File (Best for local development)
101+
102+
1. Copy `.env.example` to `.env` in the root directory.
103+
2. Fill in the required fields in `.env`.
104+
105+
---
106+
107+
### Step 3: Customize Schedule Frequency
108+
109+
Edit `.github/workflows/sync.yml` to modify the `cron` expression:
110+
111+
```yaml
112+
schedule:
113+
- cron: "0 2 * * 1" # Example: Run every Monday at 2 AM
114+
```
115+
116+
### Step 4: Manually Trigger the First Run
117+
118+
Go to **Actions → 🌟 GitHub Stars Index 同步 → Run workflow** and click run.
119+
120+
---
121+
122+
## Configuration Reference
123+
124+
| Variable | Type | Description | Default Value |
125+
| -------------------- | ------------------------ | --------------------------------------------- | --------------------------- |
126+
| `GH_USERNAME` | Required | GitHub username to sync | - |
127+
| `AI_API_KEY` | Required | AI API Key | - |
128+
| `AI_BASE_URL` | Optional | OpenAI-compatible API endpoint | `https://api.openai.com/v1` |
129+
| `AI_MODEL` | Optional | AI model to use | `gpt-4o-mini` |
130+
| `OUTPUT_FILENAME` | Optional | Base name for generated MD/HTML files | `stars` |
131+
| `VAULT_SYNC_ENABLED` | Optional | Whether to enable Obsidian sync | `false` |
132+
| `VAULT_REPO` | Optional | Vault repository (`owner/repo`) | - |
133+
| `VAULT_SYNC_PATH` | Optional | Directory path for Vault sync | `GitHub-Stars/` |
134+
| `PAGES_SYNC_ENABLED` | Optional | Whether to deploy to GitHub Pages | `true` |
135+
| `MAX_CONCURRENCY` | Optional | AI concurrency limit (recommended 1-10) | `1` |
136+
| `GH_TOKEN` | **Strongly Recommended** | Increases API limits to prevent rate-limiting | - |
137+
138+
---
139+
140+
## Obsidian Sync (Optional)
141+
142+
This feature allows you to automatically push the generated star summaries to your Obsidian Vault (or any other) GitHub repository, keeping your notes updated automatically.
143+
144+
### Core Mechanism
145+
**Cross-repo sync**: Many Obsidian users use GitHub to store and sync their notes. This project uses the GitHub API to push the generated Markdown files directly to your designated Vault repository.
146+
147+
### Setup Steps
148+
149+
1. **Prepare Target Repository**: Ensure your Obsidian Vault is already hosted on GitHub.
150+
2. **Create Personal Access Token (PAT)**:
151+
- Visit the [Fine-grained PAT configuration page](https://github.com/settings/personal-access-tokens).
152+
- **Repository access**: Choose "Only select repositories" and select your **Vault repository**.
153+
- **Permissions**: Under "Repository permissions," set **Contents** to **Read and write**.
154+
- Once generated, add it to this project's **Settings -> Secrets -> Actions** as `VAULT_PAT`.
155+
3. **Enable Sync Configuration**:
156+
- In this project's **Settings -> Variables -> Actions**:
157+
- Set `VAULT_SYNC_ENABLED` to `true`.
158+
- Set `VAULT_REPO` to `your-username/repo-name` (e.g., `iblogc/my-obsidian-vault`).
159+
- Set `VAULT_SYNC_PATH` to the desired folder in your Vault (e.g., `Reading/GitHub-Stars/`).
160+
4. **Save and Finish**: The next time the Action runs, `stars_zh.md` and `stars_en.md` will automatically appear in your Vault repository.
161+
162+
> [!TIP]
163+
> **How to view locally?**
164+
> Once the remote sync is complete, just use the **Obsidian Git** plugin to "Pull," or run `git pull` in your local vault directory. The latest star summaries will then appear in your note library.
165+
166+
---
167+
168+
## GitHub Pages Deployment (Optional)
169+
170+
This project automatically generates multi-language static web pages with real-time search functionality.
171+
172+
1. Ensure `PAGES_SYNC_ENABLED=true`.
173+
2. After running the Action once, go to **Settings -> Pages**.
174+
3. Select `gh-pages` branch and `/(root)` directory, then click Save.
175+
176+
> [!IMPORTANT]
177+
> **Data Source Migration (Compatibility for Forks)**:
178+
> - The current recommended data source is `gh-pages/data/stars.json`.
179+
> - `data/stars.json` in the `main` branch is only used for initial migration compatibility.
180+
> - Normal runs will no longer commit `data/stars.json` back to the `main` branch.
181+
182+
---
183+
184+
## Docker Deployment
185+
186+
If you want to run this long-term on a server with automatic synchronization, Docker Compose is recommended.
187+
188+
### 1. Configuration
189+
Copy `.env.example` to `.env` and fill in the necessary information:
190+
```bash
191+
cp .env.example .env
192+
# Edit .env to fill in GH_USERNAME, AI_API_KEY, and GH_TOKEN
193+
```
194+
195+
> [!IMPORTANT]
196+
> **GH_TOKEN is Mandatory**: In Docker environments, calling the GitHub API without a token easily triggers [Rate Limiting](https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api). Configuration increases the limit from 60 to 5,000 requests per hour.
197+
198+
### 2. Start Service
199+
Launch with Docker Compose:
200+
```bash
201+
docker compose up -d
202+
```
203+
This starts two containers:
204+
- `sync`: The core sync script. By default, it runs every **24 hours**. You can adjust this by setting `SCHEDULE_HOURS` in your `.env`.
205+
- `web`: An Nginx-based static server for viewing the generated index.
206+
207+
### 3. Access the Page
208+
Open your browser and visit: `http://localhost:8080`
209+
210+
### 4. Management Commands
211+
```bash
212+
# View sync logs
213+
docker logs -f github-stars-sync
214+
215+
# Run a manual sync immediately
216+
docker compose run --rm sync
217+
218+
# Update page rendering only (skip AI calls)
219+
docker compose run --rm sync --render-only
220+
```
221+
222+
---
223+
224+
## Local Installation
225+
226+
```bash
227+
# Clone the repository and install dependencies
228+
git clone https://github.com/iblogc/GithubStarsIndex.git
229+
cd GithubStarsIndex
230+
231+
# Install dependencies
232+
pip install -r requirements.txt
233+
# Or use uv (recommended)
234+
uv pip install -r requirements.txt
235+
236+
# Configure using .env
237+
cp .env.example .env
238+
# Edit .env and fill in AI_API_KEY and GH_USERNAME
239+
240+
# [Normal Run] Fetch metadata, call AI for summaries, and render pages
241+
python scripts/sync_stars.py
242+
# Or
243+
uv run scripts/sync_stars.py
244+
245+
# [Render Only] Skip fetching/AI, re-render HTML/MD from local stars.json
246+
python scripts/sync_stars.py --render-only
247+
```
248+
249+
---
250+
251+
## File Structure
252+
253+
| File | Description |
254+
| :--------------------------- | :------------------------------------------------ |
255+
| `data/stars.json` | Temporary runtime data (migration entry point) |
256+
| `templates/` | Jinja2 generation templates (Markdown/HTML) |
257+
| `dist/` | Automatically generated local results (HTML / MD) |
258+
| `scripts/sync_stars.py` | Core sync and generation script |
259+
| `.github/workflows/sync.yml` | GitHub Actions scheduled workflow |
260+
| `.env.example` | Configuration example file |
261+
262+
---
263+
264+
## Appendix: Applying for a GitHub Token (GH_TOKEN)
265+
266+
To ensure the program can smoothly crawl all your starred repositories, it's recommended to create a Personal Access Token (PAT).
267+
268+
### Steps:
269+
1. Go to the [GitHub Fine-grained PAT page](https://github.com/settings/personal-access-tokens/new).
270+
2. **Token name**: `Stars-Index-Sync` (or any name you prefer).
271+
3. **Expiration**: `90 days` or `Custom` is recommended.
272+
4. **Resource owner**: Select your personal account.
273+
5. **Repository access**: Choose `Public Repositories (read-only)` (or `All repositories`).
274+
6. **Permissions**: No special permissions are required; default public access is enough to fetch your stars list.
275+
7. Click **Generate token**, then **copy and save** it immediately.
276+
8. Add this token to the `GH_TOKEN` field in your `.env` file.
277+
278+
> [!TIP]
279+
> If you've enabled **Vault Sync (Obsidian Sync)**, you can reuse the same `VAULT_PAT` (with write permissions) as your `GH_TOKEN`.

README.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# GitHub Stars Index
22

3+
[English](README.en.md) | 中文
4+
35
> 自动抓取 GitHub Stars,生成 AI 摘要,便于检索。
46
57
## 目录
@@ -18,7 +20,7 @@
1820
- 📝 为每个仓库读取 README,调用 AI 生成内容摘要和技术标签
1921
- 🏷️ **标签智能治理**:内置 `TAG_MAPPING` 映射库,自动合并同义词、归一化技术栈(如 LLM -> AI 大模型),拒绝标签爆炸(可能效果也不好)
2022
- ⚡️ **高效率**:支持**并发调用** AI 接口,大幅提升处理大量新项目时的速度
21-
- 🗃️ **数据驱动**所有信息存储为 `data/stars.json`,支持二次开发
23+
- 🗃️ **数据驱动**运行时使用 `data/stars.json`,发布到 `gh-pages/data/stars.json`,支持二次开发
2224
- 🎨 **模版驱动**:使用 Jinja2 模版生成 Markdown 和 HTML 静态页面
2325
- ⏭️ **智能增量**:新项目调用 AI 总结,旧项目**自动同步最新的 Star 数和元数据**
2426
- ⏰ GitHub Actions **定时自动运行**,cron 表达式自由配置
@@ -172,6 +174,12 @@ schedule:
172174
2. 运行一次 Action 后,进入 **Settings -> Pages**。
173175
3. **Branch** 选择 `gh-pages`,目录选择 `/(root)`,保存。
174176

177+
> [!IMPORTANT]
178+
> **数据源迁移说明(兼容 Fork)**:
179+
> - 当前推荐的数据源为 `gh-pages/data/stars.json`。
180+
> - `main` 分支中的 `data/stars.json` 仅用于首次迁移兼容(例如 Fork 后第一次运行 Action 的回退读取)。
181+
> - 常规运行不会再把 `data/stars.json` 提交回 `main`。
182+
175183
---
176184

177185
## Docker 部署
@@ -243,14 +251,14 @@ python scripts/sync_stars.py --render-only
243251

244252
## 文件说明
245253

246-
| 文件 | 说明 |
247-
| :--------------------------- | :----------------------------------- |
248-
| `data/stars.json` | **核心数据集**(抓取的全量项目数据) |
249-
| `templates/` | Jinja2 生成模版(Markdown/HTML) |
250-
| `dist/` | 自动生成的本地成品(HTML / MD) |
251-
| `scripts/sync_stars.py` | 核心同步与生成脚本 |
252-
| `.github/workflows/sync.yml` | GitHub Actions 定时工作流 |
253-
| `.env.example` | 配置示例文件 |
254+
| 文件 | 说明 |
255+
| :--------------------------- | :--------------------------------- |
256+
| `data/stars.json` | 运行时临时数据文件(兼容迁移入口) |
257+
| `templates/` | Jinja2 生成模版(Markdown/HTML) |
258+
| `dist/` | 自动生成的本地成品(HTML / MD) |
259+
| `scripts/sync_stars.py` | 核心同步与生成脚本 |
260+
| `.github/workflows/sync.yml` | GitHub Actions 定时工作流 |
261+
| `.env.example` | 配置示例文件 |
254262

255263
---
256264

@@ -270,4 +278,3 @@ python scripts/sync_stars.py --render-only
270278

271279
> [!TIP]
272280
> 如果你也开启了 **Obsidian 同步 (Vault Sync)**,可以直接复用具有写入权限的 `VAULT_PAT` 作为 `GH_TOKEN`。
273-

0 commit comments

Comments
 (0)