Code Review Report
Reviewed by: Perplexity AI (requested by @AkshatRaj00)
Files reviewed: app.py, predict.py, train.py
Date: 2026-05-28
π΄ Critical Issues
1. API Key Hardcoding Risk (app.py)
# README says this β DANGEROUS
GEMINI_API_KEY = "your_key_here"
The get_api_key() function is correct, but the README still shows hardcoding pattern. If a user copies from README and commits, the key leaks.
Fix: Remove the hardcoded example from README. Use only:
api_key = st.secrets.get("GOOGLE_API_KEY") or os.getenv("GOOGLE_API_KEY")
2. No Input Sanitization for CSV Upload (app.py β Train Tab)
df_train = pd.read_csv(uploaded_file) # β No size or row limits
A malicious/huge CSV could crash the app or cause memory overflow on Streamlit Cloud.
Fix:
MAX_ROWS = 100_000
if len(df_train) > MAX_ROWS:
st.error(f"Dataset too large. Maximum {MAX_ROWS} rows allowed.")
st.stop()
3. Bare except Exception Swallows All Errors (app.py)
try:
...
except Exception:
return "" # β Silent failure β no logging, no user feedback
In get_ai_insight() and init_gemini(), silent catches make debugging nearly impossible in production.
Fix:
except Exception as e:
st.warning(f"AI insight unavailable: {type(e).__name__}")
return ""
π‘ Medium Issues
4. safe_predict() Confidence Formula is Hardcoded Heuristic
confidence = min(95.0, max(55.0, round((1 - min(days, 120) / 140) * 100, 1))) # β Magic numbers
This formula is arbitrary β confidence should come from the actual model (e.g., Random Forest's predict_proba or std of tree predictions).
Fix: Use RF's tree variance for real confidence:
estimators = rf_model.estimators_
predictions = [tree.predict(X)[0] for tree in estimators]
confidence = 100 - (np.std(predictions) / np.mean(predictions) * 100)
5. st.html() Used for Global CSS β Deprecated Pattern
st.html("""
<style>...</style>
""")
st.html() is meant for HTML content, not global styles. Use st.markdown(unsafe_allow_html=True) for CSS injection.
Fix:
st.markdown("<style>...</style>", unsafe_allow_html=True)
6. Session State History Has No Size Limit
st.session_state.history.insert(0, {...}) # β Grows indefinitely
After many predictions, this bloats session state memory.
Fix:
MAX_HISTORY = 100
st.session_state.history.insert(0, entry)
st.session_state.history = st.session_state.history[:MAX_HISTORY]
7. @st.cache_data on load_sample_data() β TTL Missing
@st.cache_data(show_spinner=False) # β Caches forever
def load_sample_data():
If sample_data.csv is updated, the cached version persists until redeployment.
Fix:
@st.cache_data(show_spinner=False, ttl=600) # 10 min cache
def load_sample_data():
π’ Minor / Good Practices Missing
8. No requirements.txt Version Pinning
Current requirements.txt likely has unpinned versions. This causes build failures when upstream packages release breaking changes.
Fix: Pin all versions:
streamlit==1.35.0
google-generativeai==0.7.2
scikit-learn==1.5.0
pandas==2.2.2
numpy==1.26.4
9. Missing __pycache__ in .gitignore
The repo currently has a __pycache__/ folder committed (visible in file tree). This should never be committed.
Fix β add to .gitignore:
__pycache__/
*.pyc
*.pyo
.env
*.pkl # Optional β large binary files
10. No Loading State for Model Status on First Visit
If model is not loaded, the UI shows an error but doesn't guide the user clearly to the Train tab first.
Fix: Add a visual onboarding banner:
if not MODEL_LOADED:
st.info("π Welcome! No model found. Head to the **Train** tab to upload your CSV and train the model first.")
β
What's Done Well
- Clean tab-based layout (
Predict / Train / Analytics) β very professional
validate_payload() is a great defensive pattern
confidence_badge() with color coding is a nice UX touch
- Sidebar model status indicator is clean
- Gemini integration with secrets-based API key is the right approach
reload_models() after training β good pattern
Summary Table
| # |
Issue |
Severity |
File |
| 1 |
API key hardcoding in README |
π΄ Critical |
README.md |
| 2 |
No CSV upload size limit |
π΄ Critical |
app.py |
| 3 |
Silent exception swallowing |
π΄ Critical |
app.py |
| 4 |
Heuristic confidence formula |
π‘ Medium |
app.py |
| 5 |
st.html() for CSS injection |
π‘ Medium |
app.py |
| 6 |
Unbounded session history |
π‘ Medium |
app.py |
| 7 |
Cache TTL missing |
π‘ Medium |
app.py |
| 8 |
Unpinned requirements |
π’ Minor |
requirements.txt |
| 9 |
__pycache__ committed |
π’ Minor |
.gitignore |
| 10 |
No onboarding for new users |
π’ Minor |
app.py |
π‘ Overall: Solid production-style app with clean architecture. Fixing the 3 critical issues (API key, CSV limits, silent errors) would make this deployment-safe. The confidence formula upgrade would significantly improve model trustworthiness.
Reviewed for: @AkshatRaj00
Labels suggested: bug, enhancement, security
Code Review Report
Reviewed by: Perplexity AI (requested by @AkshatRaj00)
Files reviewed:
app.py,predict.py,train.pyDate: 2026-05-28
π΄ Critical Issues
1. API Key Hardcoding Risk (
app.py)The
get_api_key()function is correct, but the README still shows hardcoding pattern. If a user copies from README and commits, the key leaks.Fix: Remove the hardcoded example from README. Use only:
2. No Input Sanitization for CSV Upload (
app.pyβ Train Tab)A malicious/huge CSV could crash the app or cause memory overflow on Streamlit Cloud.
Fix:
3. Bare
except ExceptionSwallows All Errors (app.py)In
get_ai_insight()andinit_gemini(), silent catches make debugging nearly impossible in production.Fix:
π‘ Medium Issues
4.
safe_predict()Confidence Formula is Hardcoded HeuristicThis formula is arbitrary β confidence should come from the actual model (e.g., Random Forest's
predict_probaor std of tree predictions).Fix: Use RF's tree variance for real confidence:
5.
st.html()Used for Global CSS β Deprecated Patternst.html()is meant for HTML content, not global styles. Usest.markdown(unsafe_allow_html=True)for CSS injection.Fix:
6. Session State History Has No Size Limit
After many predictions, this bloats session state memory.
Fix:
7.
@st.cache_dataonload_sample_data()β TTL MissingIf
sample_data.csvis updated, the cached version persists until redeployment.Fix:
π’ Minor / Good Practices Missing
8. No
requirements.txtVersion PinningCurrent
requirements.txtlikely has unpinned versions. This causes build failures when upstream packages release breaking changes.Fix: Pin all versions:
9. Missing
__pycache__in.gitignoreThe repo currently has a
__pycache__/folder committed (visible in file tree). This should never be committed.Fix β add to
.gitignore:10. No Loading State for Model Status on First Visit
If model is not loaded, the UI shows an error but doesn't guide the user clearly to the Train tab first.
Fix: Add a visual onboarding banner:
β What's Done Well
Predict / Train / Analytics) β very professionalvalidate_payload()is a great defensive patternconfidence_badge()with color coding is a nice UX touchreload_models()after training β good patternSummary Table
st.html()for CSS injection__pycache__committedReviewed for: @AkshatRaj00
Labels suggested:
bug,enhancement,security