This project uses machine learning techniques to detect Parkinson's Disease based on a range of biomedical voice measurements. It applies a Support Vector Machine (SVM) model to classify whether a person is affected by the disease.
The dataset used in this project was obtained from the UCI Machine Learning Repository. It consists of 195 voice recordings from individuals, each with 22 voice-related features extracted using signal processing techniques.
- Target column:
status(1 = Parkinson’s Disease, 0 = Healthy)
Key features used include:
- MDVP:Fo(Hz) – Average vocal fundamental frequency
- Jitter, Shimmer – Measures of variation in frequency and amplitude
- NHR, HNR – Noise-to-harmonics and harmonics-to-noise ratios
- DFA, RPDE, PPE – Nonlinear dynamic complexity measures
- Python
- Pandas, NumPy
- Scikit-learn
- Google Colab
The main model used in this project is:
- Support Vector Machine (SVM) with Linear Kernel
- Efficient for linearly separable data
- Scaled features using
StandardScaler
- Data loading and exploration
- Checking for missing values
- Splitting data into training and test sets
- Feature scaling using
StandardScaler - Training the model using
SVC(kernel='linear') - Evaluating model performance using
accuracy_score
The SVM model achieved high accuracy on the test set, demonstrating its effectiveness in classifying Parkinson’s cases from voice data.
- Try other classification models (Logistic Regression, Random Forest)
- Perform hyperparameter tuning (GridSearchCV)
- Implement cross-validation
- Add a front-end interface for real-world testing
- Dataset source: UCI Parkinson’s Dataset
- Project inspired by health-focused ML applications