[FEATURE] Vinto Bot Improvement Plan

Hybrid Architecture: AlphaVinto
Instead of pure MCTS, combine multiple AI paradigms:

┌─────────────────────────────────────────────────────────┐
│                    Decision Engine                       │
├─────────────┬─────────────┬─────────────┬───────────────┤
│   Neural    │    MCTS     │  Rule-Based │   Opponent    │
│  Evaluator  │   Search    │  Heuristics │    Models     │
├─────────────┴─────────────┴─────────────┴───────────────┤
│              Unified Belief State Manager                │
├─────────────────────────────────────────────────────────┤
│                 Game State Observer                      │
└─────────────────────────────────────────────────────────┘

Why? Each component handles what it's best at:

Neural net: Fast pattern recognition, position evaluation
MCTS: Deep tactical search when needed
Rules: Handle known-optimal plays instantly (never swap away Joker)
Opponent models: Exploit predictable players

2. Information Set MCTS (IS-MCTS)
Current approach: Random determinization (sample possible opponent cards randomly)
New approach: Belief-Weighted Particle Filtering
interface BeliefState {
  // For each unknown card location, maintain probability distribution
  cardBeliefs: Map<CardLocation, CardDistribution>;
  
  // Particles representing possible game states
  particles: GameStateParticle[];
  
  // Confidence in our beliefs
  entropy: number;
}

interface CardDistribution {
  probabilities: Map<Card, number>;  // Sum to 1.0
  lastUpdated: number;
  evidenceHistory: Evidence[];
}

class BeliefStateManager {
  // Update beliefs based on observations
  updateFromAction(action: GameAction): void {
    // If opponent draws and keeps: update belief toward valuable cards
    // If opponent swaps: strong signal about relative values
    // If opponent peeks and reacts: information about what they saw
    
    this.particles = this.particles
      .map(p => this.updateParticle(p, action))
      .filter(p => p.weight > THRESHOLD);
    
    this.resampleIfNeeded();
  }
  
  // Sample game states weighted by belief probability
  sampleDeterminization(): ConcreteGameState {
    const particle = this.weightedSample(this.particles);
    return particle.toConcreteState();
  }
}

Key insight: Instead of treating all unknown cards as equally likely, track evidence and maintain probability distributions.

3. Counterfactual Regret Minimization (CFR) for Equilibrium Play
For situations where game theory matters (bluffing, Vinto timing):
```
class CFRTrainer {
  // Regret tables: "How much do I regret not taking action A in state S?"
  regrets: Map<InfoSetKey, Map<Action, number>>;
  
  // Strategy tables: Current mixed strategy
  strategy: Map<InfoSetKey, Map<Action, number>>;
  
  // Train through self-play
  async train(iterations: number): Promise<void> {
    for (let i = 0; i < iterations; i++) {
      // Play game against self
      const utilities = await this.traverseGameTree(initialState, [], []);
      
      // Update regrets based on "what if I had played differently?"
      this.updateRegrets(utilities);
      
      // Update strategy using regret matching
      this.updateStrategy();
    }
  }
  
  // Get action probabilities for a state
  getStrategy(infoSet: InfoSetKey): Map<Action, number> {
    // Returns mixed strategy (probabilities over actions)
    // This approaches Nash equilibrium as training progresses
  }
}
```

Use cases:

When to call Vinto (optimal timing is a mixed strategy!)
Whether to use Queen on self vs opponent
King declaration target selection (sometimes suboptimal play confuses opponents)

4. Deep Opponent Modeling

```
interface PlayerProfile {
  // Play style classification
  aggression: number;        // 0-1: How quickly they call Vinto
  riskTolerance: number;     // 0-1: Willingness to take uncertain swaps
  bluffFrequency: number;    // 0-1: How often they misrepresent hand strength
  
  // Behavioral patterns
  patterns: {
    // "When they peek their own card, they swap it X% of the time"
    peekOwnThenSwap: number;
    // "When they have King, they use it within N turns"
    kingUsageSpeed: number;
    // "They call Vinto at average score of X"
    vintoThreshold: number;
    // "They prioritize removing cards over point reduction"
    removalVsPoints: number;
  };
  
  // Historical accuracy
  predictionHistory: PredictionResult[];
  modelConfidence: number;
}

class OpponentProfiler {
  // Build profile from observations
  updateProfile(player: PlayerId, action: GameAction, context: GameContext): void;
  
  // Predict likely actions
  predictAction(player: PlayerId, state: GameState): ActionDistribution;
  
  // Detect if opponent is exploitable
  findExploits(profile: PlayerProfile): ExploitStrategy[];
}
```

Advanced feature: Opponent Model Selection
```
// Maintain multiple hypothesis models per opponent
class MultiModelTracker {
  models: Map<PlayerId, PlayerProfile[]>;
  modelWeights: Map<PlayerId, number[]>;
  
  // Bayesian update: which model best explains observed behavior?
  updateModelWeights(player: PlayerId, action: GameAction): void {
    const likelihoods = this.models.get(player)!
      .map(model => this.actionLikelihood(action, model));
    
    // Update weights proportional to likelihood
    this.modelWeights.set(player, 
      this.bayesianUpdate(this.modelWeights.get(player)!, likelihoods)
    );
  }
  
  // Use best model(s) for prediction
  getBestModel(player: PlayerId): PlayerProfile {
    // Could return single best, or mixture
  }
}
```
5. Neural Network State Evaluator
Instead of hand-crafted evaluation functions:
```
interface NeuralEvaluator {
  // Input features
  encodeState(state: GameState, perspective: PlayerId): Float32Array;
  
  // Output
  evaluate(encoding: Float32Array): {
    winProbability: number;      // P(win game)
    expectedScore: number;       // E[final score]
    actionValues: Map<Action, number>;  // Q-values for each action
  };
}

// Feature encoding (example)
function encodeState(state: GameState, me: PlayerId): Float32Array {
  return new Float32Array([
    // My known cards (one-hot encoded: 13 ranks × 4 suits × 4 positions)
    ...encodeKnownCards(state, me),
    
    // My score estimate
    state.getScore(me) / 50,  // Normalized
    
    // Opponent scores
    ...state.opponents(me).map(o => state.getScore(o) / 50),
    
    // Cards remaining in deck
    state.deckSize / 52,
    
    // Turn number
    state.turnNumber / 30,
    
    // Phase encoding
    state.phase === 'main' ? 1 : 0,
    state.phase === 'final' ? 1 : 0,
    
    // Who called Vinto (if any)
    ...encodeVintoCaller(state, me),
    
    // Belief state summary
    ...encodeBeliefs(state.beliefs, me),
    
    // Recent action history
    ...encodeRecentActions(state, 5),
  ]);
}
```
Training approach:

Self-play to generate games
Train to predict game outcome from any position
Use as evaluation function in MCTS
Fine-tune on edge cases

6. Hierarchical Decision Making
Structure decisions at multiple levels:
```
// Strategic layer: What are we trying to achieve?
interface Strategy {
  goal: 'minimize_score' | 'prevent_opponent_vinto' | 'setup_vinto' | 'support_coalition';
  confidence: number;
  horizon: number;  // How many turns to plan
}

// Tactical layer: How do we achieve it this turn?
interface TacticalPlan {
  strategy: Strategy;
  immediateActions: Action[];
  contingencies: Map<Observation, Action[]>;
}

// Execution layer: Carry out the plan
class HierarchicalBot {
  strategyPlanner: StrategyPlanner;
  tacticalPlanner: TacticalPlanner;
  executor: ActionExecutor;
  
  async decideAction(state: GameState): Promise<Action> {
    // 1. Update or maintain current strategy
    const strategy = await this.strategyPlanner.evaluate(state);
    
    // 2. Generate tactical plan for this strategy
    const plan = await this.tacticalPlanner.plan(state, strategy);
    
    // 3. Execute first action of plan
    return this.executor.execute(plan);
  }
}
```
Example strategies:
```
const strategies = {
  rushVinto: {
    // Aggressively minimize score to call Vinto early
    prioritize: ['remove_cards', 'minimize_points'],
    vintoThreshold: 8,
  },
  
  controlGame: {
    // Accumulate information and action cards
    prioritize: ['gather_info', 'retain_action_cards'],
    vintoThreshold: 5,
  },
  
  blockOpponent: {
    // Prevent specific opponent from winning
    prioritize: ['increase_target_score', 'deny_low_cards'],
    targetPlayer: 'lowest_score',
  },
  
  coalitionChampion: {
    // Final round: I'm the one who needs to win
    prioritize: ['minimize_own_score', 'receive_support'],
  },
  
  coalitionSupport: {
    // Final round: Help the champion
    prioritize: ['transfer_resources', 'attack_vinto_caller'],
  },
};
```

7. Implicit Communication in Coalition Play
Players can't talk, but they can signal through actions:
```
interface SignalingProtocol {
  // Encoding intentions through play
  signals: {
    // "I have a Joker" - demonstrated by keeping a card after King declaration
    jokerSignal: (action: GameAction) => boolean;
    
    // "I can help you" - peeking coalition member's card
    supportSignal: (action: GameAction) => boolean;
    
    // "Attack this player" - using Queen on specific opponent
    targetSignal: (action: GameAction) => PlayerId | null;
    
    // "I'm ready to be champion" - specific swap patterns
    championSignal: (action: GameAction) => boolean;
  };
  
  // Decoding coalition member actions
  interpretAction(player: PlayerId, action: GameAction): IntentEstimate;
  
  // Planning actions that communicate intent
  generateSignalingAction(intent: Intent, availableActions: Action[]): Action;
}
```

8. Adaptive Difficulty & Personality System
Make the bot customizable:
```
interface BotPersonality {
  name: string;
  
  // Skill settings
  searchDepth: number;
  beliefAccuracy: number;  // How well it tracks cards
  mistakeRate: number;     // Occasionally suboptimal plays
  
  // Style settings
  aggression: number;
  riskTolerance: number;
  socialPlay: number;      // Coalition cooperation level
  
  // Behavioral quirks
  quirks: {
    favoriteCard?: Rank;   // Slight preference for keeping this
    hatesToLose?: boolean; // Takes risks when behind
    showsOff?: boolean;    // Prefers flashy plays
  };
}

const personalities: BotPersonality[] = [
  {
    name: "Cautious Carl",
    searchDepth: 3,
    beliefAccuracy: 0.7,
    mistakeRate: 0.05,
    aggression: 0.3,
    riskTolerance: 0.2,
    socialPlay: 0.8,
    quirks: { hatesToLose: false },
  },
  {
    name: "Aggressive Anna",
    searchDepth: 4,
    beliefAccuracy: 0.8,
    mistakeRate: 0.02,
    aggression: 0.9,
    riskTolerance: 0.7,
    socialPlay: 0.5,
    quirks: { hatesToLose: true },
  },
  // ... more personalities
];
```

9. Explainable Decisions
Help players understand (and learn from) the bot:
```
interface DecisionExplanation {
  action: Action;
  confidence: number;
  
  reasoning: {
    // Why this action?
    primaryReason: string;
    // What alternatives were considered?
    alternatives: Array<{
      action: Action;
      reason: string;
      whyRejected: string;
    }>;
    // Key factors
    factors: Array<{
      name: string;
      value: number;
      impact: 'positive' | 'negative';
    }>;
  };
  
  // Natural language
  humanReadable: string;
}

// Example output:
{
  action: { type: 'swap', position: 2, discardedCard: '7♠' },
  confidence: 0.85,
  reasoning: {
    primaryReason: "Drew a 3, which is lower than my known 7 at position 2",
    alternatives: [
      {
        action: { type: 'discard' },
        reason: "Keep current hand",
        whyRejected: "7 is too high, reducing score is priority"
      }
    ],
    factors: [
      { name: 'point_reduction', value: 4, impact: 'positive' },
      { name: 'card_knowledge', value: 0.8, impact: 'positive' },
      { name: 'cascade_potential', value: 2, impact: 'positive' },
    ]
  },
  humanReadable: "I swapped my 7 for the 3 I drew. This saves 4 points, and the 7 I discarded might trigger a toss-in cascade!"
}
```

---

### 10. **Architecture Overview**
```
┌────────────────────────────────────────────────────────────────┐
│                         VintoBot 2.0                           │
├────────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐ │
│  │   Belief     │  │   Opponent   │  │     Strategy         │ │
│  │   State      │◄─┤   Profiler   │◄─┤     Selector         │ │
│  │   Manager    │  │              │  │                      │ │
│  └──────┬───────┘  └──────┬───────┘  └──────────┬───────────┘ │
│         │                 │                      │             │
│         ▼                 ▼                      ▼             │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │                   Decision Engine                        │  │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────────────┐  │  │
│  │  │ Neural  │ │  MCTS   │ │   CFR   │ │  Rule-Based   │  │  │
│  │  │  Eval   │ │ Search  │ │ Solver  │ │  Shortcuts    │  │  │
│  │  └────┬────┘ └────┬────┘ └────┬────┘ └───────┬───────┘  │  │
│  │       └───────────┴───────────┴──────────────┘          │  │
│  │                         │                                │  │
│  │                    Action Selector                       │  │
│  └─────────────────────────┬───────────────────────────────┘  │
│                            │                                   │
│  ┌─────────────────────────▼───────────────────────────────┐  │
│  │                  Explanation Generator                   │  │
│  └─────────────────────────┬───────────────────────────────┘  │
│                            │                                   │
│                            ▼                                   │
│                     Final Action + Explanation                 │
└────────────────────────────────────────────────────────────────┘
```

What Makes This Better Than Current Bot?

Principled uncertainty handling - Belief states instead of random guessing
Learning capability - Neural net can improve from experience
Adaptive play - Exploits predictable opponents, plays safe against good ones
Game-theoretic soundness - CFR ensures unexploitable strategies
Coalition intelligence - Actual coordination, not just individual optimization
Scalable difficulty - From beginner-friendly to expert-level
Transparency - Explains decisions (great for teaching the game!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Vinto Bot Improvement Plan #65

10. Architecture Overview

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[FEATURE] Vinto Bot Improvement Plan #65

Description

10. Architecture Overview

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions