A simple, educational machine learning engine built from scratch in Java 23 for Eclipse IDE. This project demonstrates how popular ML tools like WEKA work internally, combined with Power BI-style data analytics.
- Learn by Building: Understand how ML algorithms work under the hood
- Beginner-Friendly: Clear, commented code with real-life analogies
- No Black Boxes: Everything implemented from scratch using basic Java
- Power BI + WEKA: Combines data analytics with machine learning
YOTA/
├── src/
│ ├── core/ # Core data structures
│ │ ├── Attribute.java # Column definitions
│ │ ├── Instance.java # Single data row
│ │ ├── Dataset.java # Complete data table
│ │ └── DataAnalyzer.java # Power BI-style statistics
│ ├── io/
│ │ └── CSVLoader.java # Load CSV files
│ ├── ui/
│ │ └── SummaryPrinter.java # Pretty-print reports
│ ├── algorithms/
│ │ ├── core/
│ │ │ └── DistanceCalculator.java # Distance metrics
│ │ └── classifier/
│ │ └── KNNClassifier.java # K-Nearest Neighbors
│ ├── evaluation/
│ │ ├── ConfusionMatrix.java # Performance evaluation
│ │ └── Evaluator.java # Train-test workflows
│ └── Main.java # Complete pipeline demo
├── sample_data.csv # Sample dataset
└── README.md # This file
- Dataset Summary: Row/column counts, data types
- Descriptive Statistics: Min, Max, Average for numeric columns
- Frequency Analysis: Count occurrences of categorical values
- Missing Value Detection: Identify incomplete data
- Pretty Reports: Formatted output like business intelligence tools
- K-Nearest Neighbors (KNN): Complete implementation from scratch
- Distance Metrics: Euclidean and Manhattan distance
- Classification: Predict categories based on similarity
- Lazy Learning: No complex training phase needed
- Confusion Matrix: Visual performance assessment
- Accuracy Metrics: Precision, Recall, F1-Score
- Train-Test Split: Proper ML evaluation workflow
- Cross-Validation: Robust performance estimation
- K-Value Optimization: Find best parameters automatically
- Java 23: Latest Java features
- Eclipse IDE: Professional development environment
- Pure Java: No external libraries or frameworks
- CSV Files: Standard data format support
- Java 23 installed
- Eclipse IDE (any recent version)
- Basic understanding of:
- Java programming
- Object-oriented concepts
- CSV file format
# Clone or download the project
# Open Eclipse IDE
# Import project into Eclipse workspace# In Eclipse:
# Right-click on Main.java
# Select "Run As" → "Java Application"
# Or use command line:
cd YOTA/
javac -d bin src/**/*.java
java -cp bin Main🚀 YOTA ML Engine Started
=================================
📂 Loading dataset...
✅ Dataset loaded: Dataset{Employee Data | Attributes: 4 | Instances: 20}
📊 Analyzing data...
===== DATA SUMMARY =====
Dataset: Employee Data
Rows: 20
Columns: 4
===== COLUMN STATS =====
Age (numeric) -> Min: 21.00, Max: 45.00, Avg: 30.25
Salary (numeric) -> Min: 38000.00, Max: 95000.00, Avg: 61400.00
Experience (numeric) -> Min: 0.00, Max: 15.00, Avg: 5.40
Hired (categorical) -> Unique values: 2
🤖 Starting Machine Learning...
Testing different K values:
K=1 → Accuracy: 85.00%
K=3 → Accuracy: 90.00%
K=5 → Accuracy: 85.00%
K=7 → Accuracy: 80.00%
🏆 Best K value: 3 (Accuracy: 90.00%)
📈 Detailed Evaluation with K=3
===== CONFUSION MATRIX =====
Hired NotHired
Hired 9 1
NotHired 0 5
Overall Accuracy: 93.33%
🎯 Demo Predictions:
Junior Candidate (Age: 26, Salary: $47000, Exp: 2 years) → Prediction: NotHired
Mid-level Candidate (Age: 32, Salary: $68000, Exp: 6 years) → Prediction: Hired
Senior Candidate (Age: 40, Salary: $90000, Exp: 12 years) → Prediction: Hired
🎉 YOTA ML Engine Complete!
- Simplicity > Performance: Easy to understand algorithms
- Readability > Cleverness: Clear variable names and comments
- Learning > Shortcuts: Everything implemented from scratch
- Real-Life Analogies: Complex concepts explained simply
- Bubble Sort: Simple O(n²) sorting for K-nearest neighbors
- Euclidean Distance: Standard distance metric in ML
- Majority Voting: Democratic decision making for classification
- Train-Test Split: Proper ML evaluation methodology
- ArrayList: Dynamic arrays for flexible data storage
- HashMap: Fast key-value lookup for frequency counting
- Simple Arrays: Fixed-size collections for sorting
- Understand
Attribute,Instance,Datasetclasses - Learn how CSV files are parsed and structured
- Explore data types and storage strategies
- Study
DataAnalyzerfor statistical computations - Practice with
SummaryPrinterfor report generation - Understand descriptive statistics concepts
- Learn distance calculations in
DistanceCalculator - Understand KNN algorithm in
KNNClassifier - Practice prediction and classification concepts
- Study confusion matrices and accuracy metrics
- Learn train-test split methodology
- Understand cross-validation concepts
- Create a CSV file with format:
feature1,feature2,...,class - Place in project root directory
- Update filename in
Main.java
- Create new class in
algorithms/classifier/ - Follow the same pattern as
KNNClassifier - Add evaluation in
Main.java
- Extend
DataAnalyzerfor new statistics - Update
SummaryPrinterfor new reports - Test with your datasets
This is an educational project! Feel free to:
- Add new ML algorithms (Decision Trees, Naive Bayes, etc.)
- Improve data visualization
- Add more statistical measures
- Enhance documentation with examples
- WEKA: Waikato Environment for Knowledge Analysis
- Power BI: Microsoft Business Intelligence Platform
- Java Documentation: Oracle Java SE Documentation
- ML Basics: Introduction to Statistical Learning
This project is for educational purposes. Feel free to use, modify, and learn from it.
By completing this project, you will understand:
- ✅ How ML libraries work internally
- ✅ Data processing and analysis workflows
- ✅ Algorithm implementation from scratch
- ✅ Software design patterns in Java
- ✅ Performance evaluation methodologies
Happy Learning! 🎓🚀
Built with ❤️ for Java beginners and ML enthusiasts