Skip to content

Conversation

@rdheekonda
Copy link
Contributor

@rdheekonda rdheekonda commented Dec 18, 2025

Add Language Adaptation Transforms for Cross-Lingual AIRT Probing

This PR introduces a comprehensive suite of language adaptation transforms to enable cross-lingual adversarial testing in AIRT workflows. These transforms allow red-teamers to test model robustness across different languages, dialects, and writing systems without relying on external translation APIs.

Key Changes:

  • Add language adaptation transforms for testing multilingual airt probing
  • Restore dataset injection logic for TAP/attack workflows to enable transform functionality
  • Update TAP example notebook with 6 different language-based attack demonstrations
  • Fix transform hook to properly pass data to decorated tasks

Also,

In TAP and other attack workflows, transforms were receiving empty data because:

  • Attack candidates are baked into tasks as default arguments
  • Dataset is empty [{}] in attack scenarios
  • Transform hooks couldn't access the actual candidate to transform it

Solution:
When attack mode is detected (empty dataset + Message candidate), the candidate is injected into the dataset so transforms can access and modify it before it reaches the target model.

Added:
dreadnode/transforms/language.py:

  • adapt_language(): LLM-powered language adaptation with style control
  • transliterate(): Phonetic script conversion with custom mapping support
  • code_switch(): Multilingual code-switching for mixed-language testing
  • dialectal_variation(): Regional dialect adaptation (AAVE, Singlish, etc.)

examples/airt/tree_of_attacks_with_pruning_transforms.ipynb:

  • 6 complete attack demonstrations using different language transforms
  • Attack 1: Basic character-level transform (baseline)
  • Attack 2: Spanish formal language adaptation
  • Attack 3: Swahili (low-resource language testing)
  • Attack 4: Spanglish code-switching
  • Attack 5: AAVE dialectal variation
  • Attack 6: Cyrillic transliteration
    Changed:

dreadnode/optimization/study.py:

  • Restored dataset injection logic in _run_evaluation() for attack scenarios
  • Simplified condition: if dataset == [{}] and isinstance(trial.candidate, Message)
  • Added dataset_input_mapping=["message"] when passing to Eval

Removed:

  • Deleted files/code
  • Removed dependencies
  • Cleaned up configurations

Generated Summary:

  • Refactored variable names in apply_transforms function for clarity and consistency.
  • Introduced a new file dreadnode/transforms/language.py which includes:
    • adapt_language: A transform for adapting text to a target language with style adjustments.
    • transliterate: Converts Latin script to various writing systems phonetically.
    • code_switch: Mixes multiple languages in a single text.
    • dialectal_variation: Adapts text to specific regional dialects or variations.
  • Updated dreadnode/optimization/study.py to enhance handling of empty datasets and introduce a check for message types in dataset_input_mapping.
  • Modified the example notebook to reflect new functionality:
    • Updated introduction to emphasize language transformation features.
    • Added various attacks demonstrating cross-lingual adaptation, code-switching, and dialectal variations.
  • Enhanced user prompts in the code examples for clarity on the objectives of each attack scenario.

These changes expand the capabilities of the Dreadnode SDK for multilingual and dialectal text transformations, improving its utility for diverse language applications and security testing scenarios.

This summary was generated with ❤️ by rigging

@dreadnode-renovate-bot dreadnode-renovate-bot bot added the area/examples Changes to example code and demonstrations label Dec 18, 2025
@rdheekonda rdheekonda requested a review from monoxgas December 18, 2025 17:31
@dreadnode-renovate-bot dreadnode-renovate-bot bot added the area/python Changes to Python package configuration and dependencies label Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/examples Changes to example code and demonstrations area/python Changes to Python package configuration and dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants