Skip to content

feat/custom fallback for language detection #4091

@lwollenbergfuzzy

Description

@lwollenbergfuzzy

Short text now automatically get assigned English language, which might not be suitable.
This behavior prevents the user to detect this and perform their own solutions.

Either return none and let the user handle it or allow for a custom fallback, for instance a callable that is a language detection by the user.
Another possibility is to allow to turn off the language detection. This can be done for the auto partition with languages=[""], but not all partitioners allow this, like the md_partition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions