YOTA - Beginner-Friendly Java ML Engine

A simple, educational machine learning engine built from scratch in Java 23 for Eclipse IDE. This project demonstrates how popular ML tools like WEKA work internally, combined with Power BI-style data analytics.

🎯 Project Goals

Learn by Building: Understand how ML algorithms work under the hood
Beginner-Friendly: Clear, commented code with real-life analogies
No Black Boxes: Everything implemented from scratch using basic Java
Power BI + WEKA: Combines data analytics with machine learning

📁 Project Structure

YOTA/
├── src/
│   ├── core/                 # Core data structures
│   │   ├── Attribute.java    # Column definitions
│   │   ├── Instance.java     # Single data row
│   │   ├── Dataset.java      # Complete data table
│   │   └── DataAnalyzer.java # Power BI-style statistics
│   ├── io/
│   │   └── CSVLoader.java    # Load CSV files
│   ├── ui/
│   │   └── SummaryPrinter.java # Pretty-print reports
│   ├── algorithms/
│   │   ├── core/
│   │   │   └── DistanceCalculator.java # Distance metrics
│   │   └── classifier/
│   │       └── KNNClassifier.java      # K-Nearest Neighbors
│   ├── evaluation/
│   │   ├── ConfusionMatrix.java # Performance evaluation
│   │   └── Evaluator.java       # Train-test workflows
│   └── Main.java             # Complete pipeline demo
├── sample_data.csv           # Sample dataset
└── README.md                 # This file

🚀 Features

📊 Power BI-Style Data Analytics

Dataset Summary: Row/column counts, data types
Descriptive Statistics: Min, Max, Average for numeric columns
Frequency Analysis: Count occurrences of categorical values
Missing Value Detection: Identify incomplete data
Pretty Reports: Formatted output like business intelligence tools

🤖 Machine Learning (WEKA-Like)

K-Nearest Neighbors (KNN): Complete implementation from scratch
Distance Metrics: Euclidean and Manhattan distance
Classification: Predict categories based on similarity
Lazy Learning: No complex training phase needed

📈 Evaluation & Testing

Confusion Matrix: Visual performance assessment
Accuracy Metrics: Precision, Recall, F1-Score
Train-Test Split: Proper ML evaluation workflow
Cross-Validation: Robust performance estimation
K-Value Optimization: Find best parameters automatically

🛠️ Technologies Used

Java 23: Latest Java features
Eclipse IDE: Professional development environment
Pure Java: No external libraries or frameworks
CSV Files: Standard data format support

📋 Prerequisites

Java 23 installed
Eclipse IDE (any recent version)
Basic understanding of:
- Java programming
- Object-oriented concepts
- CSV file format

🏃‍♂️ How to Run

1. Setup Project

# Clone or download the project
# Open Eclipse IDE
# Import project into Eclipse workspace

2. Compile and Run

# In Eclipse:
# Right-click on Main.java
# Select "Run As" → "Java Application"

# Or use command line:
    cd YOTA/
javac -d bin src/**/*.java
java -cp bin Main

3. Expected Output

🚀 YOTA ML Engine Started
=================================
📂 Loading dataset...
✅ Dataset loaded: Dataset{Employee Data | Attributes: 4 | Instances: 20}

📊 Analyzing data...
===== DATA SUMMARY =====
Dataset: Employee Data
Rows: 20
Columns: 4

===== COLUMN STATS =====
Age (numeric) -> Min: 21.00, Max: 45.00, Avg: 30.25
Salary (numeric) -> Min: 38000.00, Max: 95000.00, Avg: 61400.00
Experience (numeric) -> Min: 0.00, Max: 15.00, Avg: 5.40
Hired (categorical) -> Unique values: 2

🤖 Starting Machine Learning...
Testing different K values:
K=1 → Accuracy: 85.00%
K=3 → Accuracy: 90.00%
K=5 → Accuracy: 85.00%
K=7 → Accuracy: 80.00%
🏆 Best K value: 3 (Accuracy: 90.00%)

📈 Detailed Evaluation with K=3
===== CONFUSION MATRIX =====
           Hired NotHired
    Hired      9        1
 NotHired      0        5
Overall Accuracy: 93.33%

🎯 Demo Predictions:
Junior Candidate (Age: 26, Salary: $47000, Exp: 2 years) → Prediction: NotHired
Mid-level Candidate (Age: 32, Salary: $68000, Exp: 6 years) → Prediction: Hired
Senior Candidate (Age: 40, Salary: $90000, Exp: 12 years) → Prediction: Hired

🎉 YOTA ML Engine Complete!

📚 Educational Features

Code Style Rules

Simplicity > Performance: Easy to understand algorithms
Readability > Cleverness: Clear variable names and comments
Learning > Shortcuts: Everything implemented from scratch
Real-Life Analogies: Complex concepts explained simply

Algorithms Implemented

Bubble Sort: Simple O(n²) sorting for K-nearest neighbors
Euclidean Distance: Standard distance metric in ML
Majority Voting: Democratic decision making for classification
Train-Test Split: Proper ML evaluation methodology

Data Structures Used

ArrayList: Dynamic arrays for flexible data storage
HashMap: Fast key-value lookup for frequency counting
Simple Arrays: Fixed-size collections for sorting

🎓 Learning Path

Phase 1: Data Handling

Understand Attribute, Instance, Dataset classes
Learn how CSV files are parsed and structured
Explore data types and storage strategies

Phase 2: Data Analysis

Study DataAnalyzer for statistical computations
Practice with SummaryPrinter for report generation
Understand descriptive statistics concepts

Phase 3: Machine Learning

Learn distance calculations in DistanceCalculator
Understand KNN algorithm in KNNClassifier
Practice prediction and classification concepts

Phase 4: Evaluation

Study confusion matrices and accuracy metrics
Learn train-test split methodology
Understand cross-validation concepts

🔧 Customization

Add Your Own Data

Create a CSV file with format: feature1,feature2,...,class
Place in project root directory
Update filename in Main.java

Implement New Algorithms

Create new class in algorithms/classifier/
Follow the same pattern as KNNClassifier
Add evaluation in Main.java

Add New Features

Extend DataAnalyzer for new statistics
Update SummaryPrinter for new reports
Test with your datasets

🤝 Contributing

This is an educational project! Feel free to:

Add new ML algorithms (Decision Trees, Naive Bayes, etc.)
Improve data visualization
Add more statistical measures
Enhance documentation with examples

📖 References

WEKA: Waikato Environment for Knowledge Analysis
Power BI: Microsoft Business Intelligence Platform
Java Documentation: Oracle Java SE Documentation
ML Basics: Introduction to Statistical Learning

📄 License

This project is for educational purposes. Feel free to use, modify, and learn from it.

🏆 Achievements

By completing this project, you will understand:

✅ How ML libraries work internally
✅ Data processing and analysis workflows
✅ Algorithm implementation from scratch
✅ Software design patterns in Java
✅ Performance evaluation methodologies

Happy Learning! 🎓🚀

Built with ❤️ for Java beginners and ML enthusiasts

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
build		build
dist/YotaML-Portable/lib		dist/YotaML-Portable/lib
portable/YotaML-Portable		portable/YotaML-Portable
release		release
src		src
target/classes		target/classes
.gitignore		.gitignore
BUILD.md		BUILD.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.MF		MANIFEST.MF
README.md		README.md
TASK_CHECKLIST.txt		TASK_CHECKLIST.txt
YotaML-Web-Enhanced.bat		YotaML-Web-Enhanced.bat
build-complete.bat		build-complete.bat
build-installer.bat		build-installer.bat
build-simple.bat		build-simple.bat
compile_errors.txt		compile_errors.txt
create-exe.ps1		create-exe.ps1
install.ps1		install.ps1
pom.xml		pom.xml
run_eclipse_web.bat		run_eclipse_web.bat
run_fixed_web.bat		run_fixed_web.bat
run_multi_algorithm.bat		run_multi_algorithm.bat
run_web_app.bat		run_web_app.bat
run_yota.bat		run_yota.bat
run_yota_web.bat		run_yota_web.bat
sample_data.csv		sample_data.csv
setup_eclipse_web.bat		setup_eclipse_web.bat
start_yota_web.bat		start_yota_web.bat
test_upload.csv		test_upload.csv

License

AkhilsaiSammeta/yota

Folders and files

Latest commit

History

Repository files navigation

YOTA - Beginner-Friendly Java ML Engine

🎯 Project Goals

📁 Project Structure

🚀 Features

📊 Power BI-Style Data Analytics

🤖 Machine Learning (WEKA-Like)

📈 Evaluation & Testing

🛠️ Technologies Used

📋 Prerequisites

🏃‍♂️ How to Run

1. Setup Project

2. Compile and Run

3. Expected Output

📚 Educational Features

Code Style Rules

Algorithms Implemented

Data Structures Used

🎓 Learning Path

Phase 1: Data Handling

Phase 2: Data Analysis

Phase 3: Machine Learning

Phase 4: Evaluation

🔧 Customization

Add Your Own Data

Implement New Algorithms

Add New Features

🤝 Contributing

📖 References

📄 License

🏆 Achievements

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages