Date: 2025-11-18 Assessment Type: Comprehensive Package Review Overall Grade: D+ (60% - Functional prototype, not production-ready)
Functional prototype with significant gaps
The MNLNP package successfully demonstrates core concepts for comparing MNL and MNP models, but falls short of production quality in several critical areas. While basic functionality works and tests pass, the package relies on fabricated benchmark data, has statistical validity concerns, and lacks essential documentation.
✅ STRENGTHS:
- Core model fitting functions work reliably
- Robust error handling and fallback mechanisms
- Flexible handling of various model specifications
- Good test coverage for basic functionality (27 tests passing)
- Honest labeling of limitations
❌ CRITICAL WEAKNESSES:
- Fake benchmark data undermines entire recommendation system
- test_iia() statistical validity concerns (simplified implementation)
- No vignettes or comprehensive documentation
- Untested edge cases (missing data, rare alternatives, large models)
- MNP dependency not properly managed
Users could make poor methodological choices based on fabricated benchmarks or misinterpret results due to inadequate documentation.
Severity: CRITICAL Impact: Undermines package credibility
Problem:
The entire recommendation system (recommend_model(), quick_decision(), required_sample_size()) relies on mnl_mnp_benchmark.rda which contains fabricated data:
- Convergence rates are "educated guesses," not empirical findings
- Win rates and RMSE values are simulated fiction
n_replications = 0explicitly marks data as fake- Dataset labeled
data_type = "illustrative_placeholder"
Example Risk:
recommend_model(n = 300, correlation = 0.5)
# Claims: "74% MNP convergence at n=250"
# Reality: Could be 20% or 95% - WE DON'T ACTUALLY KNOWWhat Users Might Do:
- Trust convergence estimates and choose wrong model
- Cite package recommendations in publications
- Base research design on false assumptions
- Waste time/money collecting wrong sample sizes
Mitigation Status:
- ✅ Data clearly labeled with warnings
- ✅ Script ready to generate real benchmarks (
run_pilot_benchmark.R) - ❌ Real benchmarks not yet run
Recommendation: Run pilot simulation immediately (1,800 reps, ~2 hours) before any serious use.
Severity: HIGH Impact: Could produce misleading test results
Issues Identified:
Code Evidence:
# R/iia_and_decision.R, lines 131-135
# For valid Hausman test, need same dimension
# Simplified: use only coefficients that match
min_len <- min(length(full_vec), length(restricted_vec))
full_vec <- full_vec[1:min_len]
restricted_vec <- restricted_vec[1:min_len]Problem: The code explicitly says "Simplified" - this is NOT a rigorous Hausman test implementation. Proper implementation requires:
- Careful matching of corresponding coefficients
- Proper handling of different alternative structures
- Correct variance-covariance matrix construction
Observed: "Variance difference matrix not positive definite" appears regularly
Implication: The variance-covariance matrix difference should be positive definite for valid Hausman test. When it's not, the test statistic may be invalid. Current code:
- Adds small diagonal values (1e-6) to force positive definiteness
- Falls back to simplified chi-squared calculation
Risk: Test may produce:
- False positives (rejecting IIA when it holds)
- False negatives (failing to reject when IIA violated)
- Unreliable p-values
Documentation says: "Implements the Hausman-McFadden test" Reality: Implements simplified approximation with fallbacks
Recommendation:
- Get econometrician review of implementation
- Add explicit warnings about simplified approach
- Consider using established packages (mlogit has IIA test)
- Validate against known IIA violations
Severity: HIGH Impact: Users won't understand how to use package properly
Missing Components:
| Component | Status | Impact |
|---|---|---|
| Package vignette | ❌ None | Users have no introduction |
| Function examples | Don't show real-world usage | |
| Interpretation guide | ❌ None | Users may misinterpret results |
| Workflow tutorials | ❌ None | Don't know where to start |
| ?mnlChoice help | ❌ None | Package-level docs missing |
Example Problem: Current documentation shows:
#' @examples
#' \dontrun{
#' dat <- generate_choice_data(n = 300, correlation = 0.4, seed = 123)
#' iia_test <- test_iia(choice ~ x1 + x2, data_obj = dat$data)
#' }What's Wrong: Users need examples showing:
- How to prepare their own data
- How to interpret test results
- What to do when tests fail
- Real datasets from different domains
Needed:
- "Getting Started" vignette
- "Interpreting IIA Tests" guide
- "Model Selection Workflow" tutorial
- "Publishing Results" guide with publication_table() examples
Severity: MEDIUM-HIGH Impact: Package may crash or give wrong results
What Happens When:
dat <- data.frame(choice = factor(c(rep("A", 50), rep("B", 50))), x1 = rnorm(100))
test_iia(choice ~ x1, data_obj = dat)
# Error: "IIA test requires at least 3 alternatives"Problem: Error message is technically correct but unhelpful. Should say: "IIA test not applicable to binary choice. Use binary logit/probit instead."
# One alternative has only 5 observations
test_iia(..., omit_alternative = "Rare")
# Warning: "Restricted dataset very small (n < 50)"
# But still runs!Problem: Test proceeds despite insufficient data. Should refuse to run.
dat$x1[5] <- NA
test_iia(choice ~ x1, data_obj = dat)
# Crashes with cryptic errorProblem: No missing data handling anywhere in package.
Problem: MNL won't converge but package doesn't detect or handle this.
Problem: Tested only up to 10 predictors. May fail or produce garbage with many predictors.
Recommendation: Add input validation and informative error messages for all edge cases.
Severity: MEDIUM Impact: Silent failures, misleading results
Problem: MNP package is in "Suggests" not "Depends", but functions assume it's available:
compare_mnl_mnp(...) # Claims to compare BOTH models
# If MNP not installed: silently falls back to MNL-only
# User may think they got comparison when they didn't!Issues:
- No clear warnings when MNP unavailable
publication_table()doesn't warn about missing MNP comparisonquick_decision()doesn't check if MNP is actually available- Results structures inconsistent between MNP/no-MNP cases
Recommendation:
# Add to all functions using MNP:
if (!requireNamespace("MNP", quietly = TRUE)) {
message("MNP package not available. Install with: install.packages('MNP')")
message("Proceeding with MNL-only analysis.")
}Issues:
Problem: Script file that runs on source(), not a function
- Creates side effects
- Pollutes environment
- Can't be unit tested
- Runs every time package sources
Should Be: Function that's called explicitly
Problem: Results are raw lists with inconsistent structure
Current:
result <- compare_mnl_mnp(...)
str(result) # Just a list
print(result) # Ugly outputShould Have:
class(result) <- "mnlnp_comparison"
print.mnlnp_comparison()
summary.mnlnp_comparison()
plot.mnlnp_comparison()Problem: test_iia(formula_obj, data_obj, ...) uses non-standard names
Why: Workaround for scoping issues with formula() and data() base R functions
Impact:
- Violates R conventions
- Confusing for users
- Inconsistent with other functions
- Suggests deeper architectural problems
Better Solution: Proper package environment management
What's Tested: ✅
- Basic functionality with clean data
- 2-10 predictors
- 3-5 alternatives
- Core workflows
What's NOT Tested: ❌
- Error handling
- Missing data
- Convergence failures
- Edge cases (rare alternatives, perfect separation)
- Large models (20+ predictors)
- Real datasets with messy data
- Memory limits
- Reproducibility across R versions
- Integration with other packages
Test Quality Issues:
- Tests verify functions run, not that results are correct
- No comparison to known ground truth
- No statistical validation of test statistics
- No regression tests
fit_mnp_safe()- What does "safe" mean to users?compare_mnl_mnp_cv()- CV not obvious (cross-validation)quantify_model_choice_consequences()- Too verbose
recommend_model(n, correlation, ...) # Takes parameters
test_iia(formula_obj, data_obj, ...) # Takes data
quick_decision(n, n_predictors, ...) # Takes parametersWhy the inconsistency?
result <- quick_decision(300, 4)
result$model # "MNL"
result$recommendation # "MNL" (same!)
result$reason # Text
result$reasoning # Same text! (WHY TWO NAMES?)| Feature | MNLNP Status | Grade | Notes |
|---|---|---|---|
| Empirical validation | ❌ Fake data | F | Critical flaw |
| Vignettes | ❌ None | F | Must have |
| Edge case testing | ❌ Minimal | D | Basic only |
| Performance optimization | ❌ None | C | Works but slow |
| S3/S4 classes | ❌ Raw lists | D | Poor UX |
| CRAN checks | ❓ Untested | ? | Unknown |
| CI/CD | ❌ None | F | No automation |
| Code coverage | ❓ ~50%? | ? | Not measured |
| Real examples | ❌ Synthetic only | D | Need real data |
| Benchmark vs alternatives | ❌ None | F | Not compared |
Works for basic cases but not production-ready. Would be rejected from CRAN.
- ✅ Core fitting works - fit_mnp_safe() reliably fits or falls back
- ✅ Robust error handling - Functions don't crash easily
- ✅ Good basic test coverage - 27 tests all passing
- ✅ Flexibility - Handles various specifications (2-10 predictors, 3-5 alternatives)
- ✅ Honest about limitations - Fake data clearly labeled
- ✅ Quick decision tool useful - Provides instant guidance with reasoning
- ✅ Publication table generation - LaTeX/HTML/markdown output works
The foundation is solid. The package demonstrates good software engineering practices in error handling and testing. The problems are primarily about scientific validity, documentation, and production polish rather than fundamental implementation flaws.
HIGH RISK:
- Makes methodological choice based on fake benchmarks → Wrong model selection
- Trusts IIA test with small/rare groups → False conclusions
- Applies to non-standard data (missing values, rare categories) → Crashes
- Uses for publication without validation → Potential retraction
MEDIUM RISK:
- Doesn't realize MNP not installed → Thinks they compared both models
- Misinterprets output → Poor documentation leads to errors
- Doesn't validate results → Accepts incorrect conclusions
LOW RISK:
- Exploratory analysis only → Damage limited to wasted time
- Cross-validates with other methods → Catches errors
Priority 1: Run Real Benchmarks
# Execute pilot simulation (ready to run)
Rscript run_pilot_benchmark.R
# This will generate 1,800 real simulations (~2 hours)
# Replace illustrative data with empirical resultsPriority 2: Fix test_iia() Implementation
- Get econometrician to review Hausman test code
- Fix coefficient matching logic
- Add proper warnings about simplified approach
- Validate against known IIA violations from literature
- Consider just wrapping mlogit's IIA test instead
Priority 3: Add Comprehensive Documentation
- Package vignette: "Introduction to MNLNP"
- Function examples using real datasets
- Interpretation guide for test results
- Common workflows and use cases
Priority 4: Handle MNP Dependency Properly
- Add clear warnings when MNP unavailable
- Document what features require MNP
- Ensure graceful degradation
- Make dependency status transparent to users
Priority 5: Write Vignettes
- "Getting Started with MNLNP"
- "When to Use MNL vs MNP" (with real examples)
- "Interpreting IIA Tests"
- "Creating Publication Tables"
Priority 6: Create S3 Classes
class(result) <- c("mnlnp_comparison", "list")
print.mnlnp_comparison <- function(x, ...) { ... }
summary.mnlnp_comparison <- function(object, ...) { ... }
plot.mnlnp_comparison <- function(x, ...) { ... }Priority 7: Test Edge Cases
- Missing data handling
- Rare alternatives (< 10 obs)
- Large models (50+ predictors)
- Perfect separation
- Convergence failures
- Memory limits
Priority 8: Performance Improvements
- Add progress bars for long operations
- Estimate computation time before starting
- Warn about memory requirements
- Add parallel processing options
Priority 9: Standardize Interfaces
- Consistent parameter names across functions
- Consistent return structures
- Consistent documentation style
Priority 10: Add Real Datasets
- Transportation choice (already have commuter_choice)
- Marketing/brand choice
- Voting behavior
- Healthcare choices
- Multiple domains for validation
Priority 11: Benchmark Against Alternatives
- Compare to mlogit package
- Compare to VGAM package
- Validate against published results
- Document when MNLNP is better/worse
Priority 12: Performance Optimization
- Parallel processing for simulations
- Caching of expensive computations
- Algorithm efficiency improvements
To Production Quality: 4-6 weeks of focused work
| Task | Time Estimate | Dependencies |
|---|---|---|
| Run real benchmarks | 1 week | None - ready now |
| Fix test_iia() | 2-3 days | Expert review needed |
| Write vignettes | 3-4 days | Real benchmarks done |
| Add S3 classes | 2 days | None |
| Edge case testing | 1 week | None |
| Real dataset examples | 2-3 days | Datasets available |
| CRAN checks | 2-3 days | All above done |
| External review | 2 weeks | Package polished |
Critical Path: Real benchmarks → Documentation → External review
Absolutely not in current state.
The fake benchmark data alone disqualifies it. CRAN reviewers would immediately question the scientific validity.
Potentially yes, but not yet.
The concept is sound and there's genuine need for MNL vs MNP guidance. But execution needs substantial improvement before users can trust it.
Run the pilot benchmark simulation.
The script is ready (run_pilot_benchmark.R), it will take ~2 hours, and it will:
- Replace fake data with real results
- Validate/invalidate current recommendations
- Provide empirical foundation for package
- Enable honest scientific publication
Yes, absolutely.
The foundation is solid:
- Core functionality works
- Good error handling
- Reasonable test coverage
- Honest about limitations
- Demonstrates real need
With 4-6 weeks of focused work, this could become a genuinely valuable contribution to the R ecosystem.
Immediate Actions:
- ✅ Run pilot benchmark TODAY (script ready)
- ✅ Get test_iia() reviewed by statistician
- ✅ Write basic "Getting Started" vignette
- ✅ Test with real messy data from different domains
- ✅ Add input validation for edge cases
Before Any Public Use:
- Replace all illustrative data with empirical results
- Add comprehensive warnings about limitations
- Document all known issues clearly
- Get external review from domain expert
For CRAN Submission:
- Complete all HIGH PRIORITY items above
- Pass R CMD check with no errors/warnings
- Add unit tests for edge cases
- Benchmark against existing packages
- Get 2-3 users to beta test
Right Now:
- ❌ Don't use for publication
- ❌ Don't cite in papers
- ❌ Don't recommend to others
- ❌ Don't trust benchmark recommendations
Exploratory Use Only:
- ✅ Can explore basic functionality
- ✅ Can test workflow concepts
- ✅ Can identify additional needs
- ✅ Must validate all results independently
In 4-6 Weeks:
- ✅ Re-evaluate after real benchmarks run
- ✅ Check if test_iia() has been validated
- ✅ Review updated documentation
- ✅ Then consider for real use
Current State: Interesting prototype demonstrating sound concepts Scientific Validity: Questionable due to fabricated benchmarks Production Readiness: 60% - functional but not trustworthy User Risk: HIGH without proper caveats
Potential: With systematic attention to the critical issues identified above, this package could become a valuable tool for applied researchers. The need is real, the approach is reasonable, but the execution requires substantial improvement before public release.
Grade: D+ (60/100)
- Passing, but just barely
- Needs significant work
- Not ready for prime time
- But fixable with effort
Compiled: 2025-11-18 Reviewer: Comprehensive Package Assessment Next Review: After pilot benchmarks complete