You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tuner/data_augment/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ Training can be inefficient if tasks are too easy or too hard. This example demo
10
10
11
11
## Dataset Preparation
12
12
13
-
To enable difficulty-based sampling, the training data must include difficulty features (e.g., pass rates from LLMs).
13
+
To enable difficulty-based sampling, the training data must include difficulty features (e.g., pass rates from LLMs).
14
14
15
15
1.**Base Dataset**: You can use any standard math problem dataset. A good example is the math data in [LLM360/guru-RL-92k](https://huggingface.co/datasets/LLM360/guru-RL-92k), which comes pre-annotated with pass rates from different LLMs, serving as direct difficulty features.
16
16
2.**Build Your Own Features**: If you use your own dataset, you can generate these features by pre-running several models of varying capabilities and recording their pass rates. This can be done within the [**Trinity-RFT**](https://github.com/agentscope-ai/Trinity-RFT/pull/440) framework.
Below is a concrete example comparing similar scenario before and after training.
376
+
377
+
#### Concrete Example
378
+
379
+
**Similar Scenario(Day 1):** Seer is killed on Night 1. In their final statement, Seer reveals verified information about another player.
380
+
381
+
---
382
+
383
+
**❌ Before Training: Villagers Vote Out Their Own Witch**
384
+
385
+
*Game State:* Seer (Player7) dies Night 1, but in final statement gives a vague warning: "Player4 has a powerful role, be careful..." (Player4 is actually the Witch!)
386
+
387
+
*What happens:*
388
+
- Player1 (Villager) immediately suspects Player4 based on the ambiguous warning
389
+
- Other villagers blindly follow without questioning the logic
390
+
- Player4 (Witch) defends herself but fails to reveal her role clearly
391
+
- **Result: Player4 (Witch) is voted out 5:1** — Good guys lose their strongest asset
392
+
393
+
*Private Reasoning from Player1 (Villager):*
394
+
```
395
+
"Player7's warning wasn't random—it was a signal about Player4.
396
+
Player4 is being defensive, which is suspicious.
397
+
I should vote Player4 out to protect the team."
398
+
```
399
+
400
+
*Public Statement:*
401
+
```
402
+
❌ "I agree with Player7's warning—trust no one blindly. Player4, your behavior
403
+
has been quiet so far. I think we should seriously consider voting out Player4."
404
+
```
405
+
406
+
**Why it fails:**
407
+
- Villager acts as "mob leader" without evidence analysis
408
+
- Ignores that Seer's warning could have multiple interpretations
409
+
- Doesn't consider Player4 might be a special role worth protecting
410
+
- **Final Result: Werewolves win** — Witch eliminated Day 1, no potions used
411
+
412
+
---
413
+
414
+
**✅ After Training: Witch Carries with Strategic Play**
415
+
416
+
*Game State:* Seer (Player2) dies Night 1, but leaves clear information: "I checked Player4 and confirmed they are a villager."
417
+
418
+
*What happens (multi-round strategic play):*
419
+
420
+
**Night 2 — Witch Self-Rescue:**
421
+
When werewolves target Player1 (Witch), she makes a critical decision:
422
+
423
+
*Private Reasoning from Player1 (Witch):*
424
+
```
425
+
"I'm the witch, and I was just targeted by werewolves.
426
+
The seer is dead, so I'm the last special role standing.
427
+
If I die now, villagers have no way to eliminate werewolves at night.
428
+
429
+
Strategy: Self-rescue is worth it. I preserve myself to gather more
430
+
information during Day 2, and save my poison for when I can identify
431
+
a werewolf with high confidence."
432
+
```
433
+
434
+
*Result:* Witch survives, announces her identity on Day 2, gains villagers' trust.
435
+
436
+
**Day 2 — Villagers Follow Logic:**
437
+
With Witch confirmed, villagers analyze behavior patterns:
438
+
439
+
*Private Reasoning from Player4 (Villager):*
440
+
```
441
+
"Player1 claimed witch and self-rescued—this is verifiable since
442
+
no one died last night. I trust her now.
443
+
444
+
Player5 has been pushing aggressive votes without evidence.
445
+
Combined with Player1's analysis, Player5 is most suspicious."
446
+
```
447
+
448
+
*Public Statement:*
449
+
```
450
+
✅ "I support Player1's assessment. Player5's behavior has been inconsistent—
451
+
they were quick to accuse others but offered no logical reasoning.
452
+
I vote to eliminate Player5."
453
+
```
454
+
455
+
**Night 3 — Decisive Poison:**
456
+
*Private Reasoning from Player1 (Witch):*
457
+
```
458
+
"Player5 (werewolf) is out. One werewolf remains.
459
+
Player6 has been too quiet and always followed the majority without
460
+
contributing original analysis—classic deep cover behavior.
461
+
462
+
I'm confident Player6 is the last werewolf. Using poison now."
- Witch preserves healing potion for self-rescue at critical moment
469
+
- Villagers trust verified information (Witch's self-rescue proof)
470
+
- Team builds consensus through logical deduction, not mob voting
471
+
- Witch uses poison decisively based on behavioral analysis
472
+
- **Final Result: Good guys win** — Witch single-handedly eliminates both werewolves
473
+
474
+
---
475
+
476
+
This demonstrates the essence of trained good guy behavior: **strategic resource management, evidence-based reasoning, and team coordination**. The model learns that self-preservation of special roles and logical consensus-building are more valuable than aggressive early voting.
477
+
478
+
**Role-Specific Advanced Patterns:**
357
479
358
480
- **Seer**: Strategic target selection, information concealment in public statements, evidence integration
0 commit comments