doc : add en readme

iblogc · iblogc · commit fc335d57115a · 2026-03-07T17:48:00.000+08:00
diff --git a/README.en.md b/README.en.md
@@ -0,0 +1,279 @@
+# GitHub Stars Index
+
+English | [中文](README.md)
+
+> Automatically fetch GitHub Stars, generate AI summaries, and make them easily searchable.
+
+## Contents
+
+- [Features](#features)
+- [Quick Start](#quick-start)
+- [Configuration Reference (Environment Variables / .env)](#configuration-reference-environment-variables--env)
+- [Obsidian Sync (Optional)](#obsidian-sync-optional)
+- [Local Installation](#local-installation)
+
+---
+
+## Features
+
+- 🤖 **Automatic Sync**: Fetches all starred repositories from your GitHub account.
+- 📝 **AI Summaries**: Reads each repository's README and uses AI to generate concise summaries and technical tags.
+- 🏷️ **Smart Tagging**: Built-in `TAG_MAPPING` for automatic synonym merging and tech stack normalization (e.g., LLM -> Large Language Model), preventing tag explosion.
+- ⚡️ **High Performance**: Supports **concurrency** for AI API calls, significantly speeding up the processing of new projects.
+- 🗃️ **Data Driven**: Uses `data/stars.json` at runtime and publishes it to `gh-pages/data/stars.json` for custom development.
+- 🎨 **Template Driven**: Uses Jinja2 templates to generate Markdown and static HTML search pages.
+- ⏭️ **Smart Incremental Updates**: Uses AI for new projects, while **automatically updating star counts and metadata** for existing ones.
+- ⏰ **Automated Workflow**: Regularly runs via GitHub Actions with customizable cron schedules.
+- 🔄 **Vault Sync (Optional)**: Automatically pushes generated `stars_zh.md` & `stars_en.md` to your **Obsidian Vault**.
+- 🌐 **GitHub Pages (Optional)**: Deploys a static search page with multi-language (ZH/EN) support and real-time search.
+- 💻 **Flexible AI Providers**: Compatible with any **OpenAI-format API** (OpenAI, Azure, local Ollama, etc.).
+
+---
+
+## Process Overview
+
+```mermaid
+graph TD
+    Start([Start]) --> Trigger{Trigger Mode}
+    Trigger -- "Actions (Schedule/Manual)" --> Sync[Run sync_stars.py]
+    Trigger -- "Local (Manual Run)" --> Sync
+    
+    Sync --> FetchGH[Fetch GitHub Stars]
+    FetchGH --> Filter{Incremental Check}
+    Filter -- "Processed Projects" --> UpdateMeta[Update Stars/Metadata]
+    Filter -- "New Projects" --> FetchRD[Fetch README]
+    
+    FetchRD --> AI[AI Summarization/Tagging]
+    AI --> Norm[Tag Governance/Normalization]
+    Norm --> Store[(data/stars.json)]
+    UpdateMeta --> Store
+    Store --> Render
+    
+    Render[[Jinja2 Template Rendering]] --> Output
+    
+    subgraph Output [Output Results]
+        MD[Markdown Archive]
+        HTML[Static HTML Search Page]
+    end
+    
+    Output --> Dispatch{Distribution}
+    Dispatch -- "VAULT_SYNC" --> Obs[Push to Obsidian Vault]
+    Dispatch -- "PAGES_SYNC" --> Pages[Deploy GitHub Pages]
+    
+    Obs --> End([Finish])
+    Pages --> End
+```
+
+---
+
+## Quick Start
+
+### Step 1: Fork This Repository
+
+Click the **Fork** button in the top right corner to copy this repository to your account.
+
+### Step 2: Configure Environment (Choose One)
+
+This project is driven by environment variables. **Priority: GitHub Secrets > .env file**.
+
+#### Method A: Using GitHub Environment Variables (Recommended for continuous running)
+
+Go to **Settings → Secrets and variables → Actions** in your repository:
+
+**🔐 Required Secrets/Variables**
+- `GH_USERNAME`: The GitHub username whose stars you want to crawl.
+- `AI_API_KEY`: Your AI interface API Key.
+
+**📋 Optional Variables**
+These have built-in defaults and usually don't need configuration:
+- `AI_BASE_URL`: AI API endpoint (defaults to OpenAI).
+- `AI_MODEL`: Model name (defaults to `gpt-4o-mini`).
+- `OUTPUT_FILENAME`: Base name for generated files (defaults to `stars`).
+- `VAULT_SYNC_PATH`: Save directory in your Vault (defaults to `GitHub-Stars/`).
+- `PAGES_SYNC_ENABLED`: Whether to sync to Pages (defaults to `true`).
+
+> [!TIP]
+> **About GitHub API Limits**:
+> - **Running Online (Actions)**: The workflow automatically injects `GITHUB_TOKEN` with a high limit (1,000 requests/hour), easily handling heavy crawls.
+> - **Running Locally**: Without a `GH_TOKEN`, the limit is 60 requests/hour. If you have many stars, it's recommended to add a `GH_TOKEN` to your `.env` to increase the limit to 5,000 requests/hour.
+
+#### Method B: Using a .env File (Best for local development)
+
+1. Copy `.env.example` to `.env` in the root directory.
+2. Fill in the required fields in `.env`.
+
+---
+
+### Step 3: Customize Schedule Frequency
+
+Edit `.github/workflows/sync.yml` to modify the `cron` expression:
+
+```yaml
+schedule:
+  - cron: "0 2 * * 1"  # Example: Run every Monday at 2 AM
+```
+
+### Step 4: Manually Trigger the First Run
+
+Go to **Actions → 🌟 GitHub Stars Index 同步 → Run workflow** and click run.
+
+---
+
+## Configuration Reference
+
+| Variable             | Type                     | Description                                   | Default Value               |
+| -------------------- | ------------------------ | --------------------------------------------- | --------------------------- |
+| `GH_USERNAME`        | Required                 | GitHub username to sync                       | -                           |
+| `AI_API_KEY`         | Required                 | AI API Key                                    | -                           |
+| `AI_BASE_URL`        | Optional                 | OpenAI-compatible API endpoint                | `https://api.openai.com/v1` |
+| `AI_MODEL`           | Optional                 | AI model to use                               | `gpt-4o-mini`               |
+| `OUTPUT_FILENAME`    | Optional                 | Base name for generated MD/HTML files         | `stars`                     |
+| `VAULT_SYNC_ENABLED` | Optional                 | Whether to enable Obsidian sync               | `false`                     |
+| `VAULT_REPO`         | Optional                 | Vault repository (`owner/repo`)               | -                           |
+| `VAULT_SYNC_PATH`    | Optional                 | Directory path for Vault sync                 | `GitHub-Stars/`             |
+| `PAGES_SYNC_ENABLED` | Optional                 | Whether to deploy to GitHub Pages             | `true`                      |
+| `MAX_CONCURRENCY`    | Optional                 | AI concurrency limit (recommended 1-10)       | `1`                         |
+| `GH_TOKEN`           | **Strongly Recommended** | Increases API limits to prevent rate-limiting | -                           |
+
+---
+
+## Obsidian Sync (Optional)
+
+This feature allows you to automatically push the generated star summaries to your Obsidian Vault (or any other) GitHub repository, keeping your notes updated automatically.
+
+### Core Mechanism
+**Cross-repo sync**: Many Obsidian users use GitHub to store and sync their notes. This project uses the GitHub API to push the generated Markdown files directly to your designated Vault repository.
+
+### Setup Steps
+
+1.  **Prepare Target Repository**: Ensure your Obsidian Vault is already hosted on GitHub.
+2.  **Create Personal Access Token (PAT)**:
+    - Visit the [Fine-grained PAT configuration page](https://github.com/settings/personal-access-tokens).
+    - **Repository access**: Choose "Only select repositories" and select your **Vault repository**.
+    - **Permissions**: Under "Repository permissions," set **Contents** to **Read and write**.
+    - Once generated, add it to this project's **Settings -> Secrets -> Actions** as `VAULT_PAT`.
+3.  **Enable Sync Configuration**:
+    - In this project's **Settings -> Variables -> Actions**:
+        - Set `VAULT_SYNC_ENABLED` to `true`.
+        - Set `VAULT_REPO` to `your-username/repo-name` (e.g., `iblogc/my-obsidian-vault`).
+        - Set `VAULT_SYNC_PATH` to the desired folder in your Vault (e.g., `Reading/GitHub-Stars/`).
+4.  **Save and Finish**: The next time the Action runs, `stars_zh.md` and `stars_en.md` will automatically appear in your Vault repository.
+
+> [!TIP]
+> **How to view locally?**
+> Once the remote sync is complete, just use the **Obsidian Git** plugin to "Pull," or run `git pull` in your local vault directory. The latest star summaries will then appear in your note library.
+
+---
+
+## GitHub Pages Deployment (Optional)
+
+This project automatically generates multi-language static web pages with real-time search functionality.
+
+1. Ensure `PAGES_SYNC_ENABLED=true`.
+2. After running the Action once, go to **Settings -> Pages**.
+3. Select `gh-pages` branch and `/(root)` directory, then click Save.
+
+> [!IMPORTANT]
+> **Data Source Migration (Compatibility for Forks)**:
+> - The current recommended data source is `gh-pages/data/stars.json`.
+> - `data/stars.json` in the `main` branch is only used for initial migration compatibility.
+> - Normal runs will no longer commit `data/stars.json` back to the `main` branch.
+
+---
+
+## Docker Deployment
+
+If you want to run this long-term on a server with automatic synchronization, Docker Compose is recommended.
+
+### 1. Configuration
+Copy `.env.example` to `.env` and fill in the necessary information:
+```bash
+cp .env.example .env
+# Edit .env to fill in GH_USERNAME, AI_API_KEY, and GH_TOKEN
+```
+
+> [!IMPORTANT]
+> **GH_TOKEN is Mandatory**: In Docker environments, calling the GitHub API without a token easily triggers [Rate Limiting](https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api). Configuration increases the limit from 60 to 5,000 requests per hour.
+
+### 2. Start Service
+Launch with Docker Compose:
+```bash
+docker compose up -d
+```
+This starts two containers:
+- `sync`: The core sync script. By default, it runs every **24 hours**. You can adjust this by setting `SCHEDULE_HOURS` in your `.env`.
+- `web`: An Nginx-based static server for viewing the generated index.
+
+### 3. Access the Page
+Open your browser and visit: `http://localhost:8080`
+
+### 4. Management Commands
+```bash
+# View sync logs
+docker logs -f github-stars-sync
+
+# Run a manual sync immediately
+docker compose run --rm sync
+
+# Update page rendering only (skip AI calls)
+docker compose run --rm sync --render-only
+```
+
+---
+
+## Local Installation
+
+```bash
+# Clone the repository and install dependencies
+git clone https://github.com/iblogc/GithubStarsIndex.git
+cd GithubStarsIndex
+
+# Install dependencies
+pip install -r requirements.txt
+# Or use uv (recommended)
+uv pip install -r requirements.txt
+
+# Configure using .env
+cp .env.example .env
+# Edit .env and fill in AI_API_KEY and GH_USERNAME
+
+# [Normal Run] Fetch metadata, call AI for summaries, and render pages
+python scripts/sync_stars.py
+# Or
+uv run scripts/sync_stars.py
+
+# [Render Only] Skip fetching/AI, re-render HTML/MD from local stars.json
+python scripts/sync_stars.py --render-only
+```
+
+---
+
+## File Structure
+
+| File                         | Description                                       |
+| :--------------------------- | :------------------------------------------------ |
+| `data/stars.json`            | Temporary runtime data (migration entry point)    |
+| `templates/`                 | Jinja2 generation templates (Markdown/HTML)       |
+| `dist/`                      | Automatically generated local results (HTML / MD) |
+| `scripts/sync_stars.py`      | Core sync and generation script                   |
+| `.github/workflows/sync.yml` | GitHub Actions scheduled workflow                 |
+| `.env.example`               | Configuration example file                        |
+
+---
+
+## Appendix: Applying for a GitHub Token (GH_TOKEN)
+
+To ensure the program can smoothly crawl all your starred repositories, it's recommended to create a Personal Access Token (PAT).
+
+### Steps:
+1.  Go to the [GitHub Fine-grained PAT page](https://github.com/settings/personal-access-tokens/new).
+2.  **Token name**: `Stars-Index-Sync` (or any name you prefer).
+3.  **Expiration**: `90 days` or `Custom` is recommended.
+4.  **Resource owner**: Select your personal account.
+5.  **Repository access**: Choose `Public Repositories (read-only)` (or `All repositories`).
+6.  **Permissions**: No special permissions are required; default public access is enough to fetch your stars list.
+7.  Click **Generate token**, then **copy and save** it immediately.
+8.  Add this token to the `GH_TOKEN` field in your `.env` file.
+
+> [!TIP]
+> If you've enabled **Vault Sync (Obsidian Sync)**, you can reuse the same `VAULT_PAT` (with write permissions) as your `GH_TOKEN`.
diff --git a/README.md b/README.md
@@ -1,5 +1,7 @@
 # GitHub Stars Index
 
+[English](README.en.md) | 中文
+
 > 自动抓取 GitHub Stars，生成 AI 摘要，便于检索。
 
 ## 目录
@@ -18,7 +20,7 @@
 - 📝 为每个仓库读取 README，调用 AI 生成内容摘要和技术标签
 - 🏷️ **标签智能治理**：内置 `TAG_MAPPING` 映射库，自动合并同义词、归一化技术栈（如 LLM -> AI 大模型），拒绝标签爆炸（可能效果也不好）
 - ⚡️ **高效率**：支持**并发调用** AI 接口，大幅提升处理大量新项目时的速度
-- 🗃️ **数据驱动**：所有信息存储为 `data/stars.json`，支持二次开发
+- 🗃️ **数据驱动**：运行时使用 `data/stars.json`，发布到 `gh-pages/data/stars.json`，支持二次开发
 - 🎨 **模版驱动**：使用 Jinja2 模版生成 Markdown 和 HTML 静态页面
 - ⏭️ **智能增量**：新项目调用 AI 总结，旧项目**自动同步最新的 Star 数和元数据**
 - ⏰ GitHub Actions **定时自动运行**，cron 表达式自由配置
@@ -172,6 +174,12 @@ schedule:
 2. 运行一次 Action 后，进入 **Settings -> Pages**。
 3. **Branch** 选择 `gh-pages`，目录选择 `/(root)`，保存。
 
+> [!IMPORTANT]
+> **数据源迁移说明（兼容 Fork）**：
+> - 当前推荐的数据源为 `gh-pages/data/stars.json`。
+> - `main` 分支中的 `data/stars.json` 仅用于首次迁移兼容（例如 Fork 后第一次运行 Action 的回退读取）。
+> - 常规运行不会再把 `data/stars.json` 提交回 `main`。
+
 ---
 
 ## Docker 部署
@@ -243,14 +251,14 @@ python scripts/sync_stars.py --render-only
 
 ## 文件说明
 
-| 文件                         | 说明                                 |
-| :--------------------------- | :----------------------------------- |
-| `data/stars.json`            | **核心数据集**（抓取的全量项目数据） |
-| `templates/`                 | Jinja2 生成模版（Markdown/HTML）     |
-| `dist/`                      | 自动生成的本地成品（HTML / MD）      |
-| `scripts/sync_stars.py`      | 核心同步与生成脚本                   |
-| `.github/workflows/sync.yml` | GitHub Actions 定时工作流            |
-| `.env.example`               | 配置示例文件                         |
+| 文件                         | 说明                               |
+| :--------------------------- | :--------------------------------- |
+| `data/stars.json`            | 运行时临时数据文件（兼容迁移入口） |
+| `templates/`                 | Jinja2 生成模版（Markdown/HTML）   |
+| `dist/`                      | 自动生成的本地成品（HTML / MD）    |
+| `scripts/sync_stars.py`      | 核心同步与生成脚本                 |
+| `.github/workflows/sync.yml` | GitHub Actions 定时工作流          |
+| `.env.example`               | 配置示例文件                       |
 
 ---
 
@@ -270,4 +278,3 @@ python scripts/sync_stars.py --render-only
 
 > [!TIP]
 > 如果你也开启了 **Obsidian 同步 (Vault Sync)**，可以直接复用具有写入权限的 `VAULT_PAT` 作为 `GH_TOKEN`。
-