This repository contains my solution for the Wundernn Challenge, a machine learning competition focused on predicting future market states from historical sequences. The challenge involves building a model that can forecast the next state vector in a sequence based on past observations.
Predict the next market state vector based on a sequence of historical states using sequence modeling techniques.
- Sequence Length: Each sequence is exactly 1000 steps
- Warm-up Period: First 100 steps (0-99) are for context building only
- Scoring Range: Predictions are evaluated on steps 100-998
- Evaluation Metric: R² (coefficient of determination) score
- Data Structure: N anonymized numeric features describing market states
- Independence: Each sequence is independent and must reset the model's internal state
The solution uses an advanced Transformer architecture with the following components:
- Advanced Positional Encoding: Combines sinusoidal and learnable positional embeddings
- Multi-Head Self-Attention: Captures complex dependencies in market sequences
- Position-wise Attention: Adaptively weights position importance
- Residual Connections: Improves gradient flow during training
- Layer Normalization: Stabilizes training process
- Dropout Regularization: Prevents overfitting
- Handles variable-length sequences up to 1000 steps
- Efficient attention mechanisms for long sequence processing
- Adaptive learning mechanisms for position encoding
| Metric | Value |
|---|---|
| Mean R² Score | 0.396 |
| Best Performing Feature | Feature 7 (R² = 0.536) |
| Worst Performing Feature | Feature 21 (R² = 0.259) |
| Total Features | 32 |
| Model Type | Transformer |
| Feature | R² Score |
|---|---|
| 7 | 0.536 |
| 14 | 0.521 |
| 6 | 0.519 |
| 13 | 0.502 |
| 11 | 0.501 |
The loss curve shows steady improvement with well-behaved convergence. The model achieves minimal overfitting with validation loss stabilizing after approximately 20 epochs.
The predictions visualization demonstrates the model's ability to capture market state trends and variations.
The residuals plot shows the model's prediction errors across the validation set, indicating relatively balanced performance across different market states.
- Scalability: Transformer architecture scales well with sequence length
- Parallelization: Self-attention allows for efficient parallel processing
- Long-Range Dependencies: Multi-head attention captures complex patterns across the entire sequence
- Adaptability: Learnable positional encodings adjust to specific market dynamics
- Regularization: Built-in mechanisms prevent overfitting on training data
- Split sequences by
seq_ixto ensure independence - 80/20 train/validation split
- Proper sequence isolation to avoid data leakage
- Standardization of market state features
- Proper handling of sequence boundaries
- Internal state reset for each new sequence
- Optimizer: Adam with adaptive learning rate
- Loss Function: Mean Squared Error (MSE)
- Batch Size: Optimized for GPU memory utilization
- Epochs: Trained until convergence with early stopping
- Device: GPU acceleration enabled
├── src/
│ └── solution.py # Main model implementation
├── results/
│ ├── loss_curve.png # Training/validation loss history
│ ├── predictions.png # Prediction visualization
│ ├── residuals.png # Residuals analysis
│ ├── feature_scores.csv # Per-feature R² scores
│ ├── loss_history.csv # Epoch-by-epoch loss history
│ ├── summary.csv # Overall performance summary
│ └── solution_submission.zip
├── utils.py # Utility functions and DataPoint class
├── model_training.ipynb # Training notebook and analysis
└── README.md # This file
- Python 3.8+
- PyTorch
- NumPy
- Pandas
- scikit-learn
from src.solution import PredictionModel
import numpy as np
# Initialize model
model = PredictionModel()
# Make predictions on data points
prediction = model.predict(data_point)-
Feature Heterogeneity: Different features show varied predictability, with feature 7 being most predictable (R² = 0.536) and feature 21 being most challenging (R² = 0.259)
-
Model Convergence: The training curve shows stable convergence without significant overfitting, indicating good generalization
-
Sequence Context: The model effectively uses the warm-up period (steps 0-99) to build contextual representations for accurate future predictions
The solution is packaged as solution_submission.zip containing:
solution.pywith thePredictionModelclass- Proper implementation of the
predict(data_point: DataPoint)method - All required dependencies properly imported
- Ensemble Methods: Combine multiple model architectures
- Feature Engineering: Derive additional features from raw sequences
- Hyperparameter Tuning: Further optimize learning rates and architecture parameters
- Advanced Architectures: Explore state-of-the-art models like Mamba-2
- Temporal Augmentation: Apply data augmentation techniques for sequence data
This solution demonstrates the effectiveness of Transformer-based architectures for market state prediction tasks. The model successfully captures temporal dependencies in market sequences and achieves meaningful predictive performance across multiple features.


