Hybrid Architecture: AlphaVinto
Instead of pure MCTS, combine multiple AI paradigms:
┌─────────────────────────────────────────────────────────┐
│ Decision Engine │
├─────────────┬─────────────┬─────────────┬───────────────┤
│ Neural │ MCTS │ Rule-Based │ Opponent │
│ Evaluator │ Search │ Heuristics │ Models │
├─────────────┴─────────────┴─────────────┴───────────────┤
│ Unified Belief State Manager │
├─────────────────────────────────────────────────────────┤
│ Game State Observer │
└─────────────────────────────────────────────────────────┘
Why? Each component handles what it's best at:
Neural net: Fast pattern recognition, position evaluation
MCTS: Deep tactical search when needed
Rules: Handle known-optimal plays instantly (never swap away Joker)
Opponent models: Exploit predictable players
- Information Set MCTS (IS-MCTS)
Current approach: Random determinization (sample possible opponent cards randomly)
New approach: Belief-Weighted Particle Filtering
interface BeliefState {
// For each unknown card location, maintain probability distribution
cardBeliefs: Map<CardLocation, CardDistribution>;
// Particles representing possible game states
particles: GameStateParticle[];
// Confidence in our beliefs
entropy: number;
}
interface CardDistribution {
probabilities: Map<Card, number>; // Sum to 1.0
lastUpdated: number;
evidenceHistory: Evidence[];
}
class BeliefStateManager {
// Update beliefs based on observations
updateFromAction(action: GameAction): void {
// If opponent draws and keeps: update belief toward valuable cards
// If opponent swaps: strong signal about relative values
// If opponent peeks and reacts: information about what they saw
this.particles = this.particles
.map(p => this.updateParticle(p, action))
.filter(p => p.weight > THRESHOLD);
this.resampleIfNeeded();
}
// Sample game states weighted by belief probability
sampleDeterminization(): ConcreteGameState {
const particle = this.weightedSample(this.particles);
return particle.toConcreteState();
}
}
Key insight: Instead of treating all unknown cards as equally likely, track evidence and maintain probability distributions.
- Counterfactual Regret Minimization (CFR) for Equilibrium Play
For situations where game theory matters (bluffing, Vinto timing):
class CFRTrainer {
// Regret tables: "How much do I regret not taking action A in state S?"
regrets: Map<InfoSetKey, Map<Action, number>>;
// Strategy tables: Current mixed strategy
strategy: Map<InfoSetKey, Map<Action, number>>;
// Train through self-play
async train(iterations: number): Promise<void> {
for (let i = 0; i < iterations; i++) {
// Play game against self
const utilities = await this.traverseGameTree(initialState, [], []);
// Update regrets based on "what if I had played differently?"
this.updateRegrets(utilities);
// Update strategy using regret matching
this.updateStrategy();
}
}
// Get action probabilities for a state
getStrategy(infoSet: InfoSetKey): Map<Action, number> {
// Returns mixed strategy (probabilities over actions)
// This approaches Nash equilibrium as training progresses
}
}
Use cases:
When to call Vinto (optimal timing is a mixed strategy!)
Whether to use Queen on self vs opponent
King declaration target selection (sometimes suboptimal play confuses opponents)
- Deep Opponent Modeling
interface PlayerProfile {
// Play style classification
aggression: number; // 0-1: How quickly they call Vinto
riskTolerance: number; // 0-1: Willingness to take uncertain swaps
bluffFrequency: number; // 0-1: How often they misrepresent hand strength
// Behavioral patterns
patterns: {
// "When they peek their own card, they swap it X% of the time"
peekOwnThenSwap: number;
// "When they have King, they use it within N turns"
kingUsageSpeed: number;
// "They call Vinto at average score of X"
vintoThreshold: number;
// "They prioritize removing cards over point reduction"
removalVsPoints: number;
};
// Historical accuracy
predictionHistory: PredictionResult[];
modelConfidence: number;
}
class OpponentProfiler {
// Build profile from observations
updateProfile(player: PlayerId, action: GameAction, context: GameContext): void;
// Predict likely actions
predictAction(player: PlayerId, state: GameState): ActionDistribution;
// Detect if opponent is exploitable
findExploits(profile: PlayerProfile): ExploitStrategy[];
}
Advanced feature: Opponent Model Selection
// Maintain multiple hypothesis models per opponent
class MultiModelTracker {
models: Map<PlayerId, PlayerProfile[]>;
modelWeights: Map<PlayerId, number[]>;
// Bayesian update: which model best explains observed behavior?
updateModelWeights(player: PlayerId, action: GameAction): void {
const likelihoods = this.models.get(player)!
.map(model => this.actionLikelihood(action, model));
// Update weights proportional to likelihood
this.modelWeights.set(player,
this.bayesianUpdate(this.modelWeights.get(player)!, likelihoods)
);
}
// Use best model(s) for prediction
getBestModel(player: PlayerId): PlayerProfile {
// Could return single best, or mixture
}
}
- Neural Network State Evaluator
Instead of hand-crafted evaluation functions:
interface NeuralEvaluator {
// Input features
encodeState(state: GameState, perspective: PlayerId): Float32Array;
// Output
evaluate(encoding: Float32Array): {
winProbability: number; // P(win game)
expectedScore: number; // E[final score]
actionValues: Map<Action, number>; // Q-values for each action
};
}
// Feature encoding (example)
function encodeState(state: GameState, me: PlayerId): Float32Array {
return new Float32Array([
// My known cards (one-hot encoded: 13 ranks × 4 suits × 4 positions)
...encodeKnownCards(state, me),
// My score estimate
state.getScore(me) / 50, // Normalized
// Opponent scores
...state.opponents(me).map(o => state.getScore(o) / 50),
// Cards remaining in deck
state.deckSize / 52,
// Turn number
state.turnNumber / 30,
// Phase encoding
state.phase === 'main' ? 1 : 0,
state.phase === 'final' ? 1 : 0,
// Who called Vinto (if any)
...encodeVintoCaller(state, me),
// Belief state summary
...encodeBeliefs(state.beliefs, me),
// Recent action history
...encodeRecentActions(state, 5),
]);
}
Training approach:
Self-play to generate games
Train to predict game outcome from any position
Use as evaluation function in MCTS
Fine-tune on edge cases
- Hierarchical Decision Making
Structure decisions at multiple levels:
// Strategic layer: What are we trying to achieve?
interface Strategy {
goal: 'minimize_score' | 'prevent_opponent_vinto' | 'setup_vinto' | 'support_coalition';
confidence: number;
horizon: number; // How many turns to plan
}
// Tactical layer: How do we achieve it this turn?
interface TacticalPlan {
strategy: Strategy;
immediateActions: Action[];
contingencies: Map<Observation, Action[]>;
}
// Execution layer: Carry out the plan
class HierarchicalBot {
strategyPlanner: StrategyPlanner;
tacticalPlanner: TacticalPlanner;
executor: ActionExecutor;
async decideAction(state: GameState): Promise<Action> {
// 1. Update or maintain current strategy
const strategy = await this.strategyPlanner.evaluate(state);
// 2. Generate tactical plan for this strategy
const plan = await this.tacticalPlanner.plan(state, strategy);
// 3. Execute first action of plan
return this.executor.execute(plan);
}
}
Example strategies:
const strategies = {
rushVinto: {
// Aggressively minimize score to call Vinto early
prioritize: ['remove_cards', 'minimize_points'],
vintoThreshold: 8,
},
controlGame: {
// Accumulate information and action cards
prioritize: ['gather_info', 'retain_action_cards'],
vintoThreshold: 5,
},
blockOpponent: {
// Prevent specific opponent from winning
prioritize: ['increase_target_score', 'deny_low_cards'],
targetPlayer: 'lowest_score',
},
coalitionChampion: {
// Final round: I'm the one who needs to win
prioritize: ['minimize_own_score', 'receive_support'],
},
coalitionSupport: {
// Final round: Help the champion
prioritize: ['transfer_resources', 'attack_vinto_caller'],
},
};
- Implicit Communication in Coalition Play
Players can't talk, but they can signal through actions:
interface SignalingProtocol {
// Encoding intentions through play
signals: {
// "I have a Joker" - demonstrated by keeping a card after King declaration
jokerSignal: (action: GameAction) => boolean;
// "I can help you" - peeking coalition member's card
supportSignal: (action: GameAction) => boolean;
// "Attack this player" - using Queen on specific opponent
targetSignal: (action: GameAction) => PlayerId | null;
// "I'm ready to be champion" - specific swap patterns
championSignal: (action: GameAction) => boolean;
};
// Decoding coalition member actions
interpretAction(player: PlayerId, action: GameAction): IntentEstimate;
// Planning actions that communicate intent
generateSignalingAction(intent: Intent, availableActions: Action[]): Action;
}
- Adaptive Difficulty & Personality System
Make the bot customizable:
interface BotPersonality {
name: string;
// Skill settings
searchDepth: number;
beliefAccuracy: number; // How well it tracks cards
mistakeRate: number; // Occasionally suboptimal plays
// Style settings
aggression: number;
riskTolerance: number;
socialPlay: number; // Coalition cooperation level
// Behavioral quirks
quirks: {
favoriteCard?: Rank; // Slight preference for keeping this
hatesToLose?: boolean; // Takes risks when behind
showsOff?: boolean; // Prefers flashy plays
};
}
const personalities: BotPersonality[] = [
{
name: "Cautious Carl",
searchDepth: 3,
beliefAccuracy: 0.7,
mistakeRate: 0.05,
aggression: 0.3,
riskTolerance: 0.2,
socialPlay: 0.8,
quirks: { hatesToLose: false },
},
{
name: "Aggressive Anna",
searchDepth: 4,
beliefAccuracy: 0.8,
mistakeRate: 0.02,
aggression: 0.9,
riskTolerance: 0.7,
socialPlay: 0.5,
quirks: { hatesToLose: true },
},
// ... more personalities
];
- Explainable Decisions
Help players understand (and learn from) the bot:
interface DecisionExplanation {
action: Action;
confidence: number;
reasoning: {
// Why this action?
primaryReason: string;
// What alternatives were considered?
alternatives: Array<{
action: Action;
reason: string;
whyRejected: string;
}>;
// Key factors
factors: Array<{
name: string;
value: number;
impact: 'positive' | 'negative';
}>;
};
// Natural language
humanReadable: string;
}
// Example output:
{
action: { type: 'swap', position: 2, discardedCard: '7♠' },
confidence: 0.85,
reasoning: {
primaryReason: "Drew a 3, which is lower than my known 7 at position 2",
alternatives: [
{
action: { type: 'discard' },
reason: "Keep current hand",
whyRejected: "7 is too high, reducing score is priority"
}
],
factors: [
{ name: 'point_reduction', value: 4, impact: 'positive' },
{ name: 'card_knowledge', value: 0.8, impact: 'positive' },
{ name: 'cascade_potential', value: 2, impact: 'positive' },
]
},
humanReadable: "I swapped my 7 for the 3 I drew. This saves 4 points, and the 7 I discarded might trigger a toss-in cascade!"
}
10. Architecture Overview
┌────────────────────────────────────────────────────────────────┐
│ VintoBot 2.0 │
├────────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Belief │ │ Opponent │ │ Strategy │ │
│ │ State │◄─┤ Profiler │◄─┤ Selector │ │
│ │ Manager │ │ │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Decision Engine │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────────────┐ │ │
│ │ │ Neural │ │ MCTS │ │ CFR │ │ Rule-Based │ │ │
│ │ │ Eval │ │ Search │ │ Solver │ │ Shortcuts │ │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ └───────┬───────┘ │ │
│ │ └───────────┴───────────┴──────────────┘ │ │
│ │ │ │ │
│ │ Action Selector │ │
│ └─────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────▼───────────────────────────────┐ │
│ │ Explanation Generator │ │
│ └─────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ▼ │
│ Final Action + Explanation │
└────────────────────────────────────────────────────────────────┘
What Makes This Better Than Current Bot?
Principled uncertainty handling - Belief states instead of random guessing
Learning capability - Neural net can improve from experience
Adaptive play - Exploits predictable opponents, plays safe against good ones
Game-theoretic soundness - CFR ensures unexploitable strategies
Coalition intelligence - Actual coordination, not just individual optimization
Scalable difficulty - From beginner-friendly to expert-level
Transparency - Explains decisions (great for teaching the game!)
Hybrid Architecture: AlphaVinto
Instead of pure MCTS, combine multiple AI paradigms:
┌─────────────────────────────────────────────────────────┐
│ Decision Engine │
├─────────────┬─────────────┬─────────────┬───────────────┤
│ Neural │ MCTS │ Rule-Based │ Opponent │
│ Evaluator │ Search │ Heuristics │ Models │
├─────────────┴─────────────┴─────────────┴───────────────┤
│ Unified Belief State Manager │
├─────────────────────────────────────────────────────────┤
│ Game State Observer │
└─────────────────────────────────────────────────────────┘
Why? Each component handles what it's best at:
Neural net: Fast pattern recognition, position evaluation
MCTS: Deep tactical search when needed
Rules: Handle known-optimal plays instantly (never swap away Joker)
Opponent models: Exploit predictable players
Current approach: Random determinization (sample possible opponent cards randomly)
New approach: Belief-Weighted Particle Filtering
interface BeliefState {
// For each unknown card location, maintain probability distribution
cardBeliefs: Map<CardLocation, CardDistribution>;
// Particles representing possible game states
particles: GameStateParticle[];
// Confidence in our beliefs
entropy: number;
}
interface CardDistribution {
probabilities: Map<Card, number>; // Sum to 1.0
lastUpdated: number;
evidenceHistory: Evidence[];
}
class BeliefStateManager {
// Update beliefs based on observations
updateFromAction(action: GameAction): void {
// If opponent draws and keeps: update belief toward valuable cards
// If opponent swaps: strong signal about relative values
// If opponent peeks and reacts: information about what they saw
}
// Sample game states weighted by belief probability
sampleDeterminization(): ConcreteGameState {
const particle = this.weightedSample(this.particles);
return particle.toConcreteState();
}
}
Key insight: Instead of treating all unknown cards as equally likely, track evidence and maintain probability distributions.
For situations where game theory matters (bluffing, Vinto timing):
Use cases:
When to call Vinto (optimal timing is a mixed strategy!)
Whether to use Queen on self vs opponent
King declaration target selection (sometimes suboptimal play confuses opponents)
Advanced feature: Opponent Model Selection
Instead of hand-crafted evaluation functions:
Training approach:
Self-play to generate games
Train to predict game outcome from any position
Use as evaluation function in MCTS
Fine-tune on edge cases
Structure decisions at multiple levels:
Example strategies:
Players can't talk, but they can signal through actions:
Make the bot customizable:
Help players understand (and learn from) the bot:
10. Architecture Overview
What Makes This Better Than Current Bot?
Principled uncertainty handling - Belief states instead of random guessing
Learning capability - Neural net can improve from experience
Adaptive play - Exploits predictable opponents, plays safe against good ones
Game-theoretic soundness - CFR ensures unexploitable strategies
Coalition intelligence - Actual coordination, not just individual optimization
Scalable difficulty - From beginner-friendly to expert-level
Transparency - Explains decisions (great for teaching the game!)