A production-grade visa processing time predictor built with Random Forest ML, confidence-aware estimates, and optional Gemini AI guidance.
- Confidence-aware predictions — not just a number, but a range with High/Medium/Low confidence score derived from Random Forest tree agreement
- Gemini AI guidance — optional 3-sentence practical insight per prediction via Google Gemini 1.5 Flash
- CSV retraining — upload your own historical visa records to retrain the model in-app (max 5 MB / 10,000 rows)
- Training diagnostics — MAE, R², countries and visa types covered
- Prediction history — last 100 predictions with CSV export
- Dark UI — clean, production-style Streamlit interface
| Layer | Technology |
|---|---|
| Frontend / App | Streamlit 1.45 |
| ML Model | Random Forest (scikit-learn 1.5) |
| Data | pandas, numpy |
| AI Insight | Google Gemini 1.5 Flash |
| Model persistence | joblib |
| Containerization | Docker |
visapredictor/
├── app.py # Streamlit UI — Predict, Train, Analytics tabs
├── predict.py # Prediction logic, confidence scoring, input validation
├── train.py # Training pipeline, feature engineering
├── sample_data.csv # Sample dataset for testing
├── rf_model.pkl # Trained Random Forest model (auto-generated)
├── country_encoder.pkl # LabelEncoder for countries (auto-generated)
├── visa_encoder.pkl # LabelEncoder for visa types (auto-generated)
├── requirements.txt # Pinned dependencies
├── Dockerfile # Container definition
└── .gitignore
git clone https://github.com/AkshatRaj00/visapredictor.git
cd visapredictorpython -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windowspip install -r requirements.txtpython train.pyThis generates rf_model.pkl, country_encoder.pkl, and visa_encoder.pkl.
streamlit run app.pydocker build -t visaiq .
docker run -p 8501:8501 visaiqTo enable AI-generated guidance per prediction:
- Get a free API key from Google AI Studio
- For Streamlit Cloud: add to
.streamlit/secrets.tomlGOOGLE_API_KEY = "your-key-here"
- For local: set environment variable
export GOOGLE_API_KEY="your-key-here"
Upload a CSV with these exact columns:
| Column | Format | Example |
|---|---|---|
country |
String | India |
visa_type |
String | Student |
application_date |
YYYY-MM-DD | 2024-01-15 |
decision_date |
YYYY-MM-DD | 2024-03-20 |
The model computes processing time (days) from decision_date - application_date and extracts the month as a seasonal feature.
Limits: Max 5 MB, max 10,000 rows per upload.
- Push code to GitHub
- Go to share.streamlit.io
- Connect repo → set main file as
app.py - In Advanced Settings → set Python 3.12
- Add
GOOGLE_API_KEYin Secrets (optional)
⚠️ .pklmodel files must be committed or generated at startup. Add atrain.pycall in your startup script if needed.
Confidence is derived from Random Forest tree agreement:
- Each of the 100 trees in the forest makes an individual prediction.
- Standard deviation of tree predictions is computed.
- Confidence =
(1 - std_dev / max_expected_std) × 100, clamped to 0–100.
| Label | Confidence Range |
|---|---|
| 🟢 High | ≥ 70% |
| 🟡 Medium | 40–69% |
| 🔴 Low | < 40% |
- Fork the repo
- Create a feature branch:
git checkout -b feat/your-feature - Commit your changes
- Open a Pull Request
MIT License — see LICENSE for details.