Skip to content

Commit e0255d2

Browse files
committed
[Docs] Add LLM contribution guide
1 parent b9ee3bc commit e0255d2

File tree

2 files changed

+425
-194
lines changed

2 files changed

+425
-194
lines changed

agent/AGENTS.md

Lines changed: 212 additions & 94 deletions
Original file line numberDiff line numberDiff line change
@@ -1,122 +1,240 @@
11
# LLM Context Guide for Apache SeaTunnel
22

3-
This guide helps AI assistants (LLMs/Agents) make safe, consistent, and verifiable changes to the SeaTunnel codebase. It mirrors practices from mature Apache projects and adapts them to SeaTunnel’s build, testing, and documentation conventions.
3+
This guide helps AI assistants (LLMs / Agents) make **safe, consistent, and verifiable** changes to the Apache SeaTunnel codebase. It mirrors practices from mature Apache projects and adapts them to SeaTunnel’s **build, testing, architecture, and documentation conventions**.
44

5-
⚠️ **CRITICAL: Validate Before Pushing**
6-
ALWAYS run verification commands before proposing changes.
7-
- **Format Code**: `./mvnw spotless:apply`
8-
- **Quick Verify**: `./mvnw -q -DskipTests verify`
9-
- **Unit Tests**: `./mvnw test`
5+
## ⚠️ CRITICAL: Validate Before Proposing Changes
6+
7+
**Agents MUST run verification commands locally before suggesting or finalizing changes.**
8+
9+
```bash
10+
# Format code (mandatory)
11+
./mvnw spotless:apply
12+
13+
# Quick verification (mandatory)
14+
./mvnw -q -DskipTests verify
15+
16+
# Unit tests (strongly recommended)
17+
./mvnw test
18+
```
19+
20+
Failure to meet these requirements will likely result in PR rejection.
1021

1122
## Git Commit Message Convention
12-
SeaTunnel follows a strict commit message format to maintain a clean history.
13-
**Format**: `[Type][Module] Description`
14-
15-
**Types**:
16-
- `Feature`: New features
17-
- `Fix`: Bug fixes
18-
- `Improve`: Improvements to existing features
19-
- `Docs`: Documentation changes
20-
- `Test`: Test cases or test framework changes
21-
- `Chore`: Build process, dependency updates, or maintenance
22-
23-
**Modules**:
24-
- `Connector-V2`: Changes in `seatunnel-connectors-v2`
25-
- `Zeta`: Changes in `seatunnel-engine` (Zeta engine)
26-
- `Core`: Changes in `seatunnel-core`
27-
- `API`: Changes in `seatunnel-api`
28-
- `E2E`: Changes in `seatunnel-e2e`
29-
- `Transform-V2`: Changes in `seatunnel-transforms-v2`
30-
- `Format`: Changes in `seatunnel-formats`
31-
- `Translation`: Changes in `seatunnel-translation`
32-
33-
**Examples**:
34-
- `[Fix][Connector-V2] Fix MySQL connector source split bug`
35-
- `[Fix][Zeta] Fix checkpoint timeout issue`
36-
- `[Feature][Transform-V2] Add LLM transform plugin`
37-
- `[Improve][Core] Optimize jar package loading speed`
38-
- `[Docs] Update quick start guide`
39-
40-
## Key Directories
23+
24+
SeaTunnel follows a **strict commit message format** to maintain a clean and searchable history.
25+
26+
**Format**:
27+
28+
```
29+
[Type][Module] Description
30+
```
31+
32+
### Types
33+
34+
* `Feature` – New features
35+
* `Fix` – Bug fixes
36+
* `Improve` – Improvements to existing behavior
37+
* `Docs` – Documentation-only changes
38+
* `Test` – Test cases or test framework changes
39+
* `Chore` – Build, dependency, or maintenance tasks
40+
41+
### Modules
42+
43+
* `Connector-V2` – seatunnel-connectors-v2
44+
* `Zeta` – seatunnel-engine (Zeta engine)
45+
* `Core` – seatunnel-core
46+
* `API` – seatunnel-api
47+
* `Transform-V2` – seatunnel-transforms-v2
48+
* `Format` – seatunnel-formats
49+
* `Translation` – seatunnel-translation
50+
* `E2E` – seatunnel-e2e
51+
52+
### Examples
53+
54+
* `[Fix][Connector-V2] Fix MySQL source split enumeration bug`
55+
* `[Fix][Zeta] Fix checkpoint timeout under heavy backpressure`
56+
* `[Feature][Transform-V2] Add LLM transform plugin`
57+
* `[Improve][Core] Optimize jar package loading speed`
58+
* `[Docs] Update quick start guide`
59+
60+
## Repository Structure
61+
4162
```text
4263
seatunnel/
4364
├── seatunnel-api/ # Core API definitions
44-
├── seatunnel-connectors-v2/ # Source & Sink connectors (Main contribution area)
65+
├── seatunnel-connectors-v2/ # Source & Sink connectors (main contribution area)
4566
├── seatunnel-transforms-v2/ # Transform plugins (including LLM)
46-
├── seatunnel-engine/ # SeaTunnel Zeta Engine & Web UI
67+
├── seatunnel-engine/ # Zeta engine & Web UI
4768
├── seatunnel-core/ # Job submission & CLI entry points
48-
├── seatunnel-translation/ # Adapters for Flink & Spark
49-
├── seatunnel-formats/ # Data format handling (JSON, Avro, etc.)
69+
├── seatunnel-translation/ # Flink & Spark adapters
70+
├── seatunnel-formats/ # Data formats (JSON, Avro, etc.)
5071
├── seatunnel-e2e/ # End-to-End integration tests
5172
├── docs/ # Documentation (en & zh)
5273
└── config/ # Default configurations
5374
```
5475

5576
## Code Standards
56-
**Java Backend**
57-
- **Style**: Google Java Format (AOSP style). Enforced by Spotless.
58-
- **Imports**: No wildcard imports. `org.apache.seatunnel.shade.*` must be used for shaded dependencies (Guava, Jetty, Hikari, Janino, Commons-Lang3).
59-
- **License Header**: All new files must include the standard Apache Software Foundation license header.
60-
61-
**Apache License Headers**
62-
- **Requirement**: New files require ASF license headers.
63-
- **Header Content**:
64-
```java
65-
/*
66-
* Licensed to the Apache Software Foundation (ASF) under one or more
67-
* contributor license agreements. See the NOTICE file distributed with
68-
* this work for additional information regarding copyright ownership.
69-
* The ASF licenses this file to You under the Apache License, Version 2.0
70-
* (the "License"); you may not use this file except in compliance with
71-
* the License. You may obtain a copy of the License at
72-
*
73-
* http://www.apache.org/licenses/LICENSE-2.0
74-
*
75-
* Unless required by applicable law or agreed to in writing, software
76-
* distributed under the License is distributed on an "AS IS" BASIS,
77-
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
78-
* See the License for the specific language governing permissions and
79-
* limitations under the License.
80-
*/
81-
```
82-
83-
**Documentation**
84-
- **Bilingual**: User-visible changes MUST update both `docs/en` and `docs/zh`.
85-
- **Consistency**: Config options in docs must match the code implementation.
86-
87-
## Architecture Patterns
88-
**Connectors (V2)**
89-
- Implement `SeaTunnelSource` or `SeaTunnelSink`.
90-
- Use `Option` rule for configuration definition.
91-
- Support `SourceSplitEnumerator` for parallel reading.
92-
93-
**Engine (Zeta)**
94-
- **Client**: Submits job config to Master.
95-
- **Master**: Schedules tasks to Workers.
96-
- **Worker**: Executes tasks (Source -> Transform -> Sink).
97-
98-
## Test Utilities
99-
**Unit Tests**
100-
- Run with `./mvnw test`.
101-
- Located in `src/test/java` of each module.
102-
103-
**E2E Tests (`seatunnel-e2e`)**
104-
- Uses Testcontainers to spin up docker environments.
105-
- Define test cases extending `TestSuiteBase`.
106-
- **Command**: `./mvnw -DskipUT -DskipIT=false verify` (Runs ITs, can be slow).
77+
78+
### Java Backend
79+
80+
* **Formatting**: Google Java Format (AOSP style), enforced by Spotless
81+
* **Imports**:
82+
83+
* No wildcard imports
84+
* Use shaded dependencies: `org.apache.seatunnel.shade.*`
85+
* **Nullability**: Avoid implicit null assumptions
86+
* **Visibility**: Keep APIs minimal; prefer package-private when possible
87+
88+
### Apache License Header (MANDATORY)
89+
90+
All **new files** MUST include the ASF license header:
91+
92+
```java
93+
/*
94+
* Licensed to the Apache Software Foundation (ASF) under one or more
95+
* contributor license agreements. See the NOTICE file distributed with
96+
* this work for additional information regarding copyright ownership.
97+
* The ASF licenses this file to You under the Apache License, Version 2.0
98+
* (the "License"); you may not use this file except in compliance with
99+
* the License. You may obtain a copy of the License at
100+
*
101+
* http://www.apache.org/licenses/LICENSE-2.0
102+
*
103+
* Unless required by applicable law or agreed to in writing, software
104+
* distributed under the License is distributed on an "AS IS" BASIS,
105+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
106+
* See the License for the specific language governing permissions and
107+
* limitations under the License.
108+
*/
109+
```
110+
111+
## 🚨 Backward Compatibility (VERY IMPORTANT)
112+
113+
Agents MUST treat backward compatibility as a **hard constraint**.
114+
115+
* DO NOT remove or rename existing config options
116+
* DO NOT change default values casually
117+
* DO NOT break public APIs or SPI contracts
118+
119+
Any incompatible change MUST:
120+
121+
* Be explicitly documented
122+
* Include migration guidance
123+
* Be clearly explained in the PR description
124+
125+
## Dependency Rules
126+
127+
* DO NOT introduce new dependencies unless absolutely necessary
128+
* Prefer existing shaded dependencies under `org.apache.seatunnel.shade.*`
129+
* Any new dependency MUST:
130+
131+
* Be justified in the PR description
132+
* Consider shading, size, and conflict risks
133+
134+
## Architecture Guidelines
135+
136+
### Connector (V2)
137+
138+
* Implement `SeaTunnelSource` or `SeaTunnelSink`
139+
* Define configs using `Option`
140+
* Support parallelism via `SourceSplitEnumerator`
141+
* Avoid connector-specific logic leaking into engine or core
142+
143+
### Zeta Engine
144+
145+
* **Client**: Submits job config
146+
* **Master**: Schedules & coordinates
147+
* **Worker**: Executes tasks (Source → Transform → Sink)
148+
149+
Respect task boundaries and lifecycle semantics.
150+
151+
## Configuration (Option) Rules
152+
153+
* All user-facing configs MUST be defined using `Option`
154+
* Each option MUST include:
155+
156+
* name
157+
* type
158+
* default value (if applicable)
159+
* clear description
160+
* Option names are **stable contracts** and must not be renamed lightly
161+
162+
## Error Handling & Logging
163+
164+
* Exceptions MUST include sufficient context (table, task, config key)
165+
* Avoid swallowing exceptions
166+
* Use proper log levels:
167+
168+
* INFO – lifecycle events
169+
* WARN – recoverable issues
170+
* ERROR – task-failing errors
171+
* NEVER log sensitive information (passwords, tokens, credentials)
172+
173+
## Documentation Rules
174+
175+
* Any user-visible change MUST update:
176+
177+
* `docs/en`
178+
* `docs/zh`
179+
* Config names, defaults, and examples MUST match the code exactly
180+
* Documentation is part of the feature, not an afterthought
181+
182+
## Testing Guidelines
183+
184+
### Unit Tests
185+
186+
* Located under `src/test/java`
187+
* Validate behavior, not implementation details
188+
* Prefer deterministic and minimal tests
189+
190+
Command:
191+
192+
```bash
193+
./mvnw test
194+
```
195+
196+
### E2E Tests
197+
198+
* Located in `seatunnel-e2e`
199+
* Uses Testcontainers
200+
* Extend `TestSuiteBase`
201+
202+
Command:
203+
204+
```bash
205+
./mvnw -DskipUT -DskipIT=false verify
206+
```
207+
208+
## Performance Awareness
209+
210+
Agents MUST consider performance implications:
211+
212+
* Avoid unnecessary object creation in hot paths
213+
* Be cautious with large in-memory buffers
214+
* Consider parallelism and resource usage
215+
216+
## PR Scope Rule
217+
218+
* Keep changes minimal and focused
219+
* Avoid unrelated refactors or formatting-only changes
220+
* One PR should solve **one problem**
107221

108222
## Running & Debugging
109-
**Build from Source**
223+
224+
### Build from Source
225+
110226
```bash
111227
./mvnw clean install -DskipTests -Dskip.spotless=true
112228
```
113229

114-
**Install Connectors**
230+
### Install Connectors
231+
115232
```bash
116-
sh bin/install-plugin.sh 2.3.13 # Or specific version
233+
sh bin/install-plugin.sh 2.3.13
117234
```
118235

119-
**Run Job (Zeta)**
236+
### Run Job (Zeta)
237+
120238
```bash
121239
sh bin/seatunnel.sh --config config/v2.batch.config.template -e local
122240
```

0 commit comments

Comments
 (0)