[Feature] Support Code-Mixed Text

## Problem

Real-world multilingual text often mixes languages within sentences:

```python
# Current behavior - FAILS on mixed language text
text = "C'est vraiment amazing!"  # French-English
guardrail.validate(text)  # Incorrect results

text = "Das ist really gut"  # German-English  
guardrail.validate(text)  # Fails
```

This is extremely common in:
- Social media (majority of multilingual posts)
- Chat applications
- Informal communication
- Global communities

## Proposed Solution

```python
# Enhanced API
result = guardrail.validate(
    "C'est un deepfake, right?",
    handle_code_mixing=True
)

print(result.explanation)
# {
#   'languages_detected': ['fr', 'en'],
#   'code_mixed': True,
#   'primary_language': 'fr',
#   'mixing_ratio': {'fr': 0.7, 'en': 0.3}
# }
```

## Technical Requirements

1. **Token-level language detection**
2. **Multi-language embedding spaces**
3. **Smooth handling of script switches**
4. **Consistent detection across mixed segments**

## Implementation Approach

```python
class CodeMixedProcessor:
    def process(self, text):
        # Segment by language
        segments = self.segment_by_language(text)
        
        # Process each segment with appropriate model
        results = []
        for segment in segments:
            model = self.get_model(segment.language)
            results.append(model.process(segment.text))
        
        # Aggregate results
        return self.aggregate(results)
```

## Why This Matters

- **Real-world usage**: Majority of casual multilingual communication is code-mixed
- **Current failure**: Guardrails give incorrect results on mixed text
- **Growing trend**: Code-mixing increasing with global communication
- **Safety critical**: Malicious content often uses code-mixing to evade detection

## Test Cases

```python
test_cases = [
    ("C'est totally bizarre", ['fr', 'en']),
    ("Das ist really gut", ['de', 'en']),
    ("Это очень cool", ['ru', 'en']),
]
```

## Note

This is separate from Unicode/UA compliance. Even with perfect Unicode support, code-mixed text needs special handling for:
- Language model selection
- Tokenization boundaries
- Semantic understanding across languages

## References

- [Code-Mixing in NLP](https://arxiv.org/abs/2107.00323)
- [GLUECoS Benchmark](https://microsoft.github.io/GLUECoS/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support Code-Mixed Text #110

Problem

Proposed Solution

Technical Requirements

Implementation Approach

Why This Matters

Test Cases

Note

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] Support Code-Mixed Text #110

Description

Problem

Proposed Solution

Technical Requirements

Implementation Approach

Why This Matters

Test Cases

Note

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions