Skip to content

[Feature Request]: Expose parent-child chunking configuration via HTTP API and Python SDK #13857

@void-travellers

Description

@void-travellers

Is your feature request related to a problem?

Parent-child chunking was introduced in v0.23.0 (PRs #11598, #11629, #11810, #11997) and is configurable via the web UI on the dataset Configuration page ("Child chunks are used for retrieval" toggle + child chunk delimiter). However, this configuration is not exposed through the HTTP API or Python SDK.
The parser_config object for chunk_method: "naive" currently accepts: auto_keywords, auto_questions, chunk_token_num, delimiter, html4excel, layout_recognize, tag_kb_ids, task_page_size, raptor, graphrag — but has no parameter for enabling child chunking or setting the child chunk delimiter.
This means users who manage datasets programmatically (via API-driven ingestion pipelines) cannot enable parent-child chunking without manually toggling it in the UI for each dataset.

Describe the feature you'd like

Add parent-child chunking parameters to parser_config for the naive chunk method in both the HTTP API (POST /api/v1/datasets, PUT /api/v1/datasets/{dataset_id}) and the Python SDK (RAGFlow.create_dataset(), DataSet.update()).
Suggested shape (adapt to match internal field names):

"parser_config": {
"chunk_token_num": 512,
"delimiter": "\n",
"child_chunking": {
"enabled": true,
"delimiter": "\n"
}
}
Additional context

The backend implementation already exists — this is purely about wiring the existing config to the API surface. The UI writes this config to the dataset; the API should be able to do the same.

Metadata

Metadata

Assignees

Labels

💞 featureFeature request, pull request that fullfill a new feature.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions