Skip to content

Spaces before XML tags are randomly removed in the translation and quotes around the tags might end up within the tag #48

Open
@funnel20

Description

Describe the bug
When using the latest version 1.13.0 of the Deepl NodeJS lib I notice an issue with XML tags.
When using source text Please start your '<x id=p1>Basic</x>' plan by clicking the button '<x id=p2>Accept</x>'., the translation gets different syntax around the <x></x> tags.

To Reproduce
Steps to reproduce the behavior:

  1. API integrated correctly in NodeJS project
  2. Call:
    const result = await translator.translateText("Please start your '<x id=p1>Basic</x>' plan by clicking the button '<x id=p2>Accept</x>'.", "en", "de", { tagHandling: 'xml' });
  3. The console output with the German translation:
    Bitte starten Sie Ihren<x id=p1>'Basic</x>'-Plan, indem Sie auf die Schaltfläche<x id=p2>'Akzeptieren</x>' klicken.
    
  4. Analysis of translation syntax:
  • The original English tags are surrounded by single quotes: '<x id=p1>Basic</x>', while in the German output the opening quote is moved within the tags: <x id=p1>'Basic</x>'
  • The original English opening tags have a space in front of them: your '<x id=p1>Basic</x>', while in the German output the opening tag is directly concatenated to the previous word: Ihren<x id=p1>'Basic</x>'
    The expected output should be:
    Bitte starten Sie Ihren '<x id=p1>Basic</x>'-Plan, indem Sie auf die Schaltfläche '<x id=p2>Akzeptieren</x>' klicken.
    
  1. Added parameters preserveFormatting: true, outlineDetection: true and nonSplittingTags: ['x'], but each individual or all possible combinations provide the same German output string.

Expected behavior
It's expected that formatting characters (like spaces) and other non-translatable characters (like quotes) around tags are maintained, especially when option preserveFormatting is set to true.

Update
After creating this post I did some more testing. It appears that the (single) quotes might be the issue. When using double quotes, the same issue occurs.
However, when removing the quotes around the XML tags:

const result = await translator.translateText("Please start your <x id=p1>Basic</x> plan by clicking the button <x id=p2>Accept</x>.", "en", "de", { tagHandling: 'xml' });

The output maintains the spaces around the tags ✅:

Bitte starten Sie Ihren <x id=p1>Basic-Plan</x>, indem Sie auf die Schaltfläche <x id=p2>Akzeptieren</x> klicken.

Update 2
After creating this post I noticed that I didn't use quotes for the value of attribute id (see table "With Attributes" at https://developers.deepl.com/docs/xml-and-html-handling/xml). So basically my input string was malformed XML.

However, when applying quotes around p1 and p2, the API still returns the same erroneous output:

const result = await translator.translateText("Please start your '<x id="p1">Basic</x>' plan by clicking the button '<x id="p2">Accept</x>'.", "en", "de", { tagHandling: 'xml' });

Question
Why doesn't the API handle quotes around XML tags properly?

Screenshots
N/A

Desktop (please complete the following information):

  • OS: macOS 14.5

Additional context

  • npm deepl-node 1.13.0
  • NodeJS 16.6.0

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions