TranslationRecognizer silently drops target languages with overlapping codes (en/en-US, fil/fi)

# TranslationRecognizer silently drops target languages when using overlapping or prefix-colliding language codes in `add_target_language`

**IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:**

- **Speech SDK log**: Not yet captured. Can provide on request if needed. The issue is consistently reproducible with the combinations listed below.
- **Simplified source code**: Minimal reproduction script included below.
- **WAV file**: Not relevant — the issue occurs regardless of audio content. Any input audio that produces a successful `TranslatedSpeech` result will reproduce the problem.

---

**Describe the bug**

When calling `add_target_language()` with multiple target languages on `SpeechTranslationConfig`, certain combinations of language codes cause one or more translations to be **silently missing** from `result.translations`. The affected language key simply does not appear in the translations dictionary.

This happens in two scenarios:

1. **Same base language with different specificity** — e.g., `en` combined with `en-US`. The service appears to internally resolve `en` to `en-US`, causing a collision. `en` + `en-GB` does not collide, supporting this theory.
2. **Prefix collision between different languages** — e.g., `fil` (Filipino) followed by `fi` (Finnish). When `fil` is added first, `fi` is dropped. Reversed order works fine.

**Crucially, no error is raised.** The `Canceled` event does not fire. `result.reason` is still `ResultReason.TranslatedSpeech`. The failing language is simply absent from the result dictionary, causing silent data loss.

---

**To Reproduce**

1. Create a `SpeechTranslationConfig` using the v2 universal endpoint.
2. Call `add_target_language()` with one of the failing combinations listed below.
3. Perform speech recognition with any valid audio input.
4. Inspect `result.translations` — the dropped language key will be missing entirely.

### Minimal reproduction script

```python
import os
import azure.cognitiveservices.speech as speechsdk
from azure.cognitiveservices.speech import translation, languageconfig

speech_key = os.environ.get("AZURE_SPEECH_KEY")
speech_region = os.environ.get("AZURE_SPEECH_REGION")

endpoint = f"wss://{speech_region}.stt.speech.microsoft.com/speech/universal/v2"

translation_config = translation.SpeechTranslationConfig(
    subscription=speech_key, endpoint=endpoint
)

# ❌ BUG: "en-US" will be silently dropped from results
target_langs = ["en", "en-US"]

# ✅ WORKS: both translations appear
# target_langs = ["en", "en-GB"]

for lang in target_langs:
    translation_config.add_target_language(lang)

auto_detect = languageconfig.AutoDetectSourceLanguageConfig(
    languages=["ja-JP"]
)

audio_config = speechsdk.audio.AudioConfig(filename="test_audio.wav")

recognizer = translation.TranslationRecognizer(
    translation_config=translation_config,
    audio_config=audio_config,
    auto_detect_source_language_config=auto_detect,
)

result = recognizer.recognize_once()

if result.reason == speechsdk.ResultReason.TranslatedSpeech:
    print(f"Recognized: {result.text}")
    print(f"Translation keys: {list(result.translations.keys())}")
    for lang, text in result.translations.items():
        print(f"  [{lang}] {text}")
    # ⚠️ "en-US" key will be missing here — no error raised
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation = result.cancellation_details
    print(f"Canceled: {cancellation.reason}, {cancellation.error_details}")
```

### Test results — English locale combinations

| `add_target_language()` order | Result                                 |
| ----------------------------- | -------------------------------------- |
| `['en', 'en-GB', 'en-US']`    | ❌ **en-US** missing from translations |
| `['en', 'en-US', 'en-GB']`    | ❌ **en-US** missing from translations |
| `['en', 'en-GB']`             | ✅ Pass                                |
| `['en', 'en-US']`             | ❌ **en-US** missing from translations |
| `['en-GB', 'en', 'en-US']`    | ❌ **en** missing from translations    |
| `['en-GB', 'en-US', 'en']`    | ❌ **en** missing from translations    |
| `['en-GB', 'en-US']`          | ✅ Pass                                |
| `['en-GB', 'en']`             | ❌ **en** missing from translations    |
| `['en-US', 'en', 'en-GB']`    | ❌ **en** missing from translations    |
| `['en-US', 'en-GB', 'en']`    | ❌ **en** missing from translations    |
| `['en-US', 'en-GB']`          | ✅ Pass                                |
| `['en-US', 'en']`             | ❌ **en** missing from translations    |

### Test results — Prefix collision (`fi` vs `fil`)

| `add_target_language()` order | Result                              |
| ----------------------------- | ----------------------------------- |
| `['fi', 'fil']`               | ✅ Pass                             |
| `['fil', 'fi']`               | ❌ **fi** missing from translations |

### Observed patterns

- **`en` and `en-US` always collide** regardless of order — the later one is dropped. But `en` + `en-GB` and `en-US` + `en-GB` both pass. This strongly suggests the service resolves `en` → `en-US` internally.
- **When 3 English variants are combined**, the one dropped is always either `en` or `en-US` — whichever is added later relative to the other. `en-GB` is never affected. This further supports the theory that `en` is internally resolved to `en-US`.
- **`fil` before `fi`** drops `fi`, but reversed order works. This points to a prefix-matching issue in internal language routing.

---

**Expected behavior**

Either:

- All languages passed to `add_target_language()` should produce a translation result in `result.translations`, **or**
- The SDK should raise an explicit error (e.g., `Canceled` event with error details) when an unsupported language combination is configured.

Silent omission of translation results with no error is the worst possible failure mode for a translation service.

---

**Version of the Cognitive Services Speech SDK**

`1.49.0` (`azure-cognitiveservices-speech`)

---

**Platform, Operating System, and Programming Language**

- **OS**: Linux (Ubuntu-based)
- **Hardware**: x64
- **Programming language**: Python 3.8

---

**Additional context**

- The documentation for [Language Identification](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-identification) states: _"Don't include multiple locales of the same language, for example, en-US and en-GB"_ — but this restriction is documented only for **Language Identification candidate languages**, not for **translation target languages**. The [Speech Translation how-to guide](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-translate-speech) has no equivalent warning.
- The [language support page](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support) recommends using language codes (e.g., `es` instead of `es-ES`) for translation targets, but does not document the collision behavior.
- If this behavior is by design, it should be explicitly documented, and the SDK should emit a warning or error at configuration time rather than silently dropping results.


`add_target_language()` order	Result
`['en', 'en-GB', 'en-US']`	❌ en-US missing from translations
`['en', 'en-US', 'en-GB']`	❌ en-US missing from translations
`['en', 'en-GB']`	✅ Pass
`['en', 'en-US']`	❌ en-US missing from translations
`['en-GB', 'en', 'en-US']`	❌ en missing from translations
`['en-GB', 'en-US', 'en']`	❌ en missing from translations
`['en-GB', 'en-US']`	✅ Pass
`['en-GB', 'en']`	❌ en missing from translations
`['en-US', 'en', 'en-GB']`	❌ en missing from translations
`['en-US', 'en-GB', 'en']`	❌ en missing from translations
`['en-US', 'en-GB']`	✅ Pass
`['en-US', 'en']`	❌ en missing from translations

`add_target_language()` order	Result
`['fi', 'fil']`	✅ Pass
`['fil', 'fi']`	❌ fi missing from translations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TranslationRecognizer silently drops target languages with overlapping codes (en/en-US, fil/fi) #3024

TranslationRecognizer silently drops target languages when using overlapping or prefix-colliding language codes in `add_target_language`

Minimal reproduction script

Test results — English locale combinations

Test results — Prefix collision (`fi` vs `fil`)

Observed patterns

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

TranslationRecognizer silently drops target languages with overlapping codes (en/en-US, fil/fi) #3024

Description

TranslationRecognizer silently drops target languages when using overlapping or prefix-colliding language codes in add_target_language

Minimal reproduction script

Test results — English locale combinations

Test results — Prefix collision (fi vs fil)

Observed patterns

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

TranslationRecognizer silently drops target languages when using overlapping or prefix-colliding language codes in `add_target_language`

Test results — Prefix collision (`fi` vs `fil`)