Skip to content

Commit 5a6a5fe

Browse files
committed
docs: add AGENTS.md and update docs & gitignore
Add AI-assistant guidance files (AGENTS.md) at repository root and under vermeer, and expand documentation across the project: significantly update top-level README.md, computer/README.md, and vermeer/README.md with architecture, quick-starts, build/test instructions, and examples. Also update CI badge link in README and add AI-assistant-specific ignore patterns to .gitignore and vermeer/.gitignore to avoid tracking assistant artifacts.
1 parent cec0f80 commit 5a6a5fe

File tree

7 files changed

+1664
-92
lines changed

7 files changed

+1664
-92
lines changed

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,15 @@ build/
5555
*.log
5656
*.pyc
5757

58+
# AI assistant specific files (we only maintain AGENTS.md)
59+
CLAUDE.md
60+
GEMINI.md
61+
CURSOR.md
62+
COPILOT.md
63+
.cursorrules
64+
.cursor/
65+
.github/copilot-instructions.md
66+
5867
# maven ignore
5968

6069
apache-hugegraph-*-incubating-*/

AGENTS.md

Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
# AGENTS.md
2+
3+
This file provides guidance to AI coding assistants when working with code in this repository.
4+
5+
## Repository Overview
6+
7+
This is the Apache HugeGraph-Computer repository containing two distinct graph computing systems:
8+
9+
1. **computer** (Java/Maven): A distributed BSP/Pregel-style graph processing framework that runs on Kubernetes or YARN
10+
2. **vermeer** (Go): A high-performance in-memory graph computing platform with master-worker architecture
11+
12+
Both integrate with HugeGraph for graph data input/output.
13+
14+
## Build & Test Commands
15+
16+
### Computer (Java)
17+
18+
**Prerequisites:**
19+
- JDK 11 for building/running
20+
- JDK 8 for HDFS dependencies
21+
- Maven 3.5+
22+
- For K8s module: run `mvn clean install` first to generate CRD classes under computer-k8s
23+
24+
**Build:**
25+
```bash
26+
cd computer
27+
mvn clean compile -Dmaven.javadoc.skip=true
28+
```
29+
30+
**Tests:**
31+
```bash
32+
# Unit tests
33+
mvn test -P unit-test
34+
35+
# Integration tests
36+
mvn test -P integrate-test
37+
```
38+
39+
**Run single test:**
40+
```bash
41+
# Run specific test class
42+
mvn test -P unit-test -Dtest=ClassName
43+
44+
# Run specific test method
45+
mvn test -P unit-test -Dtest=ClassName#methodName
46+
```
47+
48+
**License check:**
49+
```bash
50+
mvn apache-rat:check
51+
```
52+
53+
**Package:**
54+
```bash
55+
mvn clean package -DskipTests
56+
```
57+
58+
### Vermeer (Go)
59+
60+
**Prerequisites:**
61+
- Go 1.23+
62+
- `curl` and `unzip` (for downloading binary dependencies)
63+
64+
**First-time setup:**
65+
```bash
66+
cd vermeer
67+
make init # Downloads supervisord and protoc binaries, installs Go deps
68+
```
69+
70+
**Build:**
71+
```bash
72+
make # Build for current platform
73+
make build-linux-amd64
74+
make build-linux-arm64
75+
```
76+
77+
**Development build with hot-reload UI:**
78+
```bash
79+
go build -tags=dev
80+
```
81+
82+
**Clean:**
83+
```bash
84+
make clean # Remove built binaries and generated assets
85+
make clean-all # Also remove downloaded tools
86+
```
87+
88+
**Run:**
89+
```bash
90+
# Using binary directly
91+
./vermeer --env=master
92+
./vermeer --env=worker
93+
94+
# Using script (configure in vermeer.sh)
95+
./vermeer.sh start master
96+
./vermeer.sh start worker
97+
```
98+
99+
**Regenerate protobuf (if proto files changed):**
100+
```bash
101+
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.28.0
102+
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@v1.2.0
103+
tools/protoc/osxm1/protoc *.proto --go-grpc_out=. --go_out=.
104+
```
105+
106+
## Architecture
107+
108+
### Computer (Java) - BSP/Pregel Framework
109+
110+
**Module Structure:**
111+
- `computer-api`: Public interfaces for graph processing (Computation, Vertex, Edge, Aggregator, Combiner, GraphFactory)
112+
- `computer-core`: Runtime implementation (WorkerService, MasterService, messaging, BSP coordination, managers)
113+
- `computer-algorithm`: Built-in algorithms (PageRank, LPA, WCC, SSSP, TriangleCount, etc.)
114+
- `computer-driver`: Job submission and driver-side coordination
115+
- `computer-k8s`: Kubernetes deployment integration
116+
- `computer-yarn`: YARN deployment integration
117+
- `computer-k8s-operator`: Kubernetes operator for job management
118+
- `computer-dist`: Distribution packaging
119+
- `computer-test`: Integration and unit tests
120+
121+
**Key Design Patterns:**
122+
123+
1. **API/Implementation Separation**: Algorithms depend only on `computer-api` interfaces; `computer-core` provides runtime implementation. Algorithms are dynamically loaded via config.
124+
125+
2. **Manager Pattern**: `WorkerService` composes multiple managers (MessageSendManager, MessageRecvManager, WorkerAggrManager, DataServerManager, SortManagers, SnapshotManager, etc.) with lifecycle hooks: `initAll()`, `beforeSuperstep()`, `afterSuperstep()`, `closeAll()`.
126+
127+
3. **BSP Coordination**: Explicit barrier synchronization via etcd (EtcdBspClient). Each superstep follows:
128+
- `workerStepPrepareDone``waitMasterStepPrepareDone`
129+
- Local compute (vertices process messages)
130+
- `workerStepComputeDone``waitMasterStepComputeDone`
131+
- Aggregators/snapshots
132+
- `workerStepDone``waitMasterStepDone` (master returns SuperstepStat)
133+
134+
4. **Computation Contract**: Algorithms implement `Computation<M extends Value>`:
135+
- `compute0(context, vertex)`: Initialize at superstep 0
136+
- `compute(context, vertex, messages)`: Process messages in subsequent supersteps
137+
- Access to aggregators, combiners, and message sending via `ComputationContext`
138+
139+
**Important Files:**
140+
- Algorithm contract: `computer/computer-api/src/main/java/org/apache/hugegraph/computer/core/worker/Computation.java`
141+
- Runtime orchestration: `computer/computer-core/src/main/java/org/apache/hugegraph/computer/core/worker/WorkerService.java`
142+
- BSP coordination: `computer/computer-core/src/main/java/org/apache/hugegraph/computer/core/bsp/Bsp4Worker.java`
143+
- Example algorithm: `computer/computer-algorithm/src/main/java/org/apache/hugegraph/computer/algorithm/centrality/pagerank/PageRank.java`
144+
145+
### Vermeer (Go) - In-Memory Computing Engine
146+
147+
**Directory Structure:**
148+
- `algorithms/`: Go algorithm implementations (pagerank.go, sssp.go, louvain.go, etc.)
149+
- `apps/`:
150+
- `bsp/`: BSP coordination helpers
151+
- `graphio/`: HugeGraph I/O adapters (reads via gRPC to store/pd, writes via HTTP REST)
152+
- `master/`: Master scheduling, HTTP endpoints, worker management
153+
- `compute/`: Worker-side compute logic
154+
- `protos/`: Generated protobuf/gRPC definitions
155+
- `common/`: Utilities, logging, metrics
156+
- `client/`: Client libraries
157+
- `tools/`: Binary dependencies (supervisord, protoc)
158+
- `ui/`: Web UI assets
159+
160+
**Key Patterns:**
161+
162+
1. **Maker/Registry Pattern**: Graph loaders/writers register themselves via init() (e.g., `LoadMakers[LoadTypeHugegraph] = &HugegraphMaker{}`). Master selects loader by type.
163+
164+
2. **HugeGraph Integration**:
165+
- `hugegraph.go` implements HugegraphMaker, HugegraphLoader, HugegraphWriter
166+
- Queries PD via gRPC for partition metadata
167+
- Streams vertex/edge data via gRPC from store (ScanPartition)
168+
- Writes results back via HugeGraph HTTP REST API
169+
170+
3. **Master-Worker**: Master schedules LoadPartition tasks to workers, manages worker lifecycle via WorkerManager/WorkerClient, exposes HTTP admin endpoints.
171+
172+
**Important Files:**
173+
- HugeGraph integration: `vermeer/apps/graphio/hugegraph.go`
174+
- Master scheduling: `vermeer/apps/master/tasks/tasks.go`
175+
- Worker management: `vermeer/apps/master/workers/workers.go`
176+
- HTTP endpoints: `vermeer/apps/master/services/http_master.go`
177+
178+
## Integration with HugeGraph
179+
180+
**Computer (Java):**
181+
- `WorkerInputManager` reads vertices/edges from HugeGraph via `GraphFactory` abstraction
182+
- Graph data is partitioned and distributed to workers via input splits
183+
184+
**Vermeer (Go):**
185+
- Directly queries HugeGraph PD (metadata service) for partition information
186+
- Uses gRPC to stream graph data from HugeGraph store
187+
- Writes computed results back via HugeGraph HTTP REST API (adds properties to vertices)
188+
189+
## Development Workflow
190+
191+
**Adding a New Algorithm (Computer):**
192+
1. Create class in `computer-algorithm` implementing `Computation<MessageType>`
193+
2. Implement `compute0()` for initialization and `compute()` for message processing
194+
3. Use `context.sendMessage()` or `context.sendMessageToAllEdges()` for message passing
195+
4. Register aggregators in `beforeSuperstep()`, read/write in `compute()`
196+
5. Configure algorithm class name in job config
197+
198+
**K8s-Operator Development:**
199+
- CRD classes are auto-generated; run `mvn clean install` in `computer-k8s-operator` first
200+
- Generated classes appear in `computer-k8s/target/generated-sources/`
201+
- CRD generation script: `computer-k8s-operator/crd-generate/Makefile`
202+
203+
**Vermeer Asset Updates:**
204+
- Web UI assets must be regenerated after changes: `cd asset && go generate`
205+
- Or use `make generate-assets` from vermeer root
206+
- For dev mode with hot-reload: `go build -tags=dev`
207+
208+
## Testing Notes
209+
210+
**Computer:**
211+
- Integration tests require etcd, HDFS, HugeGraph, and Kubernetes (see `.github/workflows/computer-ci.yml`)
212+
- Test environment setup scripts in `computer-dist/src/assembly/travis/`
213+
- Unit tests run in isolation without external dependencies
214+
215+
**Vermeer:**
216+
- Test scripts in `vermeer/test/`
217+
- Configuration files in `vermeer/config/` (master.ini, worker.ini templates)
218+
219+
## CI/CD
220+
221+
CI pipeline (`.github/workflows/computer-ci.yml`) runs:
222+
1. License check (Apache RAT)
223+
2. Setup HDFS (Hadoop 3.3.2)
224+
3. Setup Minikube/Kubernetes
225+
4. Load test data into HugeGraph
226+
5. Compile with Java 11
227+
6. Run integration tests (`-P integrate-test`)
228+
7. Run unit tests (`-P unit-test`)
229+
8. Upload coverage to Codecov
230+
231+
## Important Notes
232+
233+
- **Computer K8s module**: Must run `mvn clean install` before editing to generate CRD classes
234+
- **Java version**: Build requires JDK 11; HDFS dependencies require JDK 8
235+
- **Vermeer binary deps**: First-time builds need `make init` to download supervisord/protoc
236+
- **BSP coordination**: Computer uses etcd for barrier synchronization (configure via `BSP_ETCD_URL`)
237+
- **Memory management**: Both systems auto-manage memory by spilling to disk when needed

0 commit comments

Comments
 (0)