Skip to content

Commit d199368

Browse files
authored
Merge pull request #10 from StarRocks/ast
Ast
2 parents 5a49125 + 64983eb commit d199368

File tree

7 files changed

+2038
-546
lines changed

7 files changed

+2038
-546
lines changed

README.md

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,27 @@ This code and most of the README are from the team at [PlayCanvas](https://githu
2525
preserve structure)
2626
--suffix <suffix> Custom suffix for output files (default: language
2727
name)
28-
--completeness <mode> Completeness check mode: warn, fail, or off (default:
29-
warn)
30-
--log-chunk-metadata Log API metadata for each chunk (and on mismatches)
28+
--log-chunk-metadata Log API metadata for each chunk
3129
-h, --help display help for command
3230
```
3331

32+
The translator now uses the AST pipeline by default.
33+
34+
### Interpreting AST parse failures
35+
36+
In AST mode, each chunk asks the model to return a strict JSON array of `{ id, text }` items.
37+
38+
- Parse errors such as `Expected ',' or '}'` or `Expected ':' after property name` usually mean the model returned malformed JSON for that chunk.
39+
- These are response-format failures, not semantic translation failures.
40+
- `finishReason: STOP` with parse errors means the output completed, but the JSON structure was invalid.
41+
- When you see `json repair retry`, the tool requested a strict JSON retry and recovered automatically.
42+
- When you see `split fallback recovered X/Y missing ids`, the tool retried unresolved IDs in smaller sub-batches and merged recovered results back into the chunk.
43+
44+
How to read the outcome:
45+
46+
- `AST completeness check: Translated IDs N/N - ✅ PASS` means the chunk is fully recovered, even if repair notes are present.
47+
- Missing IDs after all retries are the only case that indicates unresolved chunk-level translation for those specific items.
48+
3449
## Quick Start
3550

3651
1. cd into the root of this repo
@@ -153,6 +168,9 @@ md-translate translate -i docs/guide.md -l French -o docs/guide_fr.md
153168

154169
# Translate using API key argument
155170
md-translate translate -i file.md -l German --key your-api-key
171+
172+
# Translate with AST mode (default)
173+
md-translate translate -i examples/External_table.md -l Japanese
156174
```
157175

158176
### Batch Processing
@@ -189,8 +207,7 @@ Options:
189207
-k, --key <apikey> Google Gemini API key (optional)
190208
--flat Use flat structure in output directory (default: preserve structure)
191209
--suffix <suffix> Custom suffix for output files (default: language name)
192-
--completeness <mode> Completeness check mode: warn, fail, or off (default: warn)
193-
--log-chunk-metadata Log API metadata for each chunk (and on mismatches)
210+
--log-chunk-metadata Log API metadata for each chunk
194211
```
195212

196213
#### `languages` - List supported languages

bin/cli.js

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ import fs from 'fs-extra';
88
import ora from 'ora';
99

1010

11-
import MarkdownTranslator from '../src/translator.js';
11+
import AstMarkdownTranslator from '../src/translator_ast_mvp.js';
1212

1313
const program = new Command();
1414

@@ -29,8 +29,8 @@ program
2929
.command('translate')
3030
.description('Translate markdown files to specified language')
3131
.requiredOption('-i, --input <pattern>', 'Input file path or glob pattern (e.g., "*.md", "docs/**/*.md")')
32-
.requiredOption('-l, --language <lang>', 'Target language (e.g., Spanish, French, German)')
33-
.option('-s, --source <lang>', 'Source language (default: English)')
32+
.requiredOption('-l, --language <lang>', 'Target language (e.g., Spanish, French, German)')
33+
.option('-s, --source <lang>', 'Source language (default: English)')
3434
.option('-o, --output <file>', 'Output file path (for single file translation)')
3535
.option('-d, --output-dir <dir>', 'Output directory (for batch translation or single file)')
3636
.option('-k, --key <apikey>', 'Google Gemini API key (or set GEMINI_API_KEY env var)')
@@ -51,7 +51,7 @@ program
5151
}
5252

5353
// Initialize translator
54-
const translator = new MarkdownTranslator(apiKey);
54+
const translator = new AstMarkdownTranslator(apiKey);
5555

5656
// Check if input is a glob pattern (contains wildcards or multiple matches)
5757
const inputPattern = options.input;
@@ -78,6 +78,7 @@ program
7878
console.log(chalk.gray(` Source: ${options.source || 'English'}`));
7979
console.log(chalk.gray(` Language: ${options.language}`));
8080
console.log(chalk.gray(` Structure: ${options.flat ? 'Flat' : 'Preserved'}`));
81+
console.log(chalk.gray(' Mode: AST'));
8182
console.log('');
8283

8384
// Create progress handler
@@ -174,6 +175,7 @@ program
174175
console.log(chalk.gray(` Output: ${outputPath}`));
175176
console.log(chalk.gray(` Source: ${options.source || 'English'}`));
176177
console.log(chalk.gray(` Language: ${options.language}`));
178+
console.log(chalk.gray(' Mode: AST'));
177179
console.log('');
178180

179181
// Create progress spinner
@@ -230,7 +232,7 @@ program
230232
console.log(chalk.blue('🌍 Supported Languages:'));
231233
console.log('');
232234

233-
const languages = MarkdownTranslator.getSupportedLanguages();
235+
const languages = AstMarkdownTranslator.getSupportedLanguages();
234236
const columns = 3;
235237
const rows = Math.ceil(languages.length / columns);
236238

0 commit comments

Comments
 (0)