This project focuses on early detection of Parkinson’s Disease (PD) using machine learning on vocal biomarkers.
The model analyses voice features such as jitter, shimmer, and frequency variation to classify whether a patient has Parkinson’s or not — achieving 92.3% accuracy and an ROC-AUC of 0.96.
| Category | Details |
|---|---|
| Domain | Healthcare / Bioinformatics / AI |
| Objective | Early diagnosis of Parkinson’s disease using voice measurements |
| Algorithm Used | Random Forest Classifier |
| Accuracy | 92.3% |
| ROC-AUC Score | 0.962 |
| Dataset | UCI Parkinson’s Disease Dataset |
| Language | Python |
| Libraries | Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib |
-
Data Collection
The dataset was obtained from the UCI Machine Learning Repository. It consists of 195 voice recordings, each with 23 biomedical voice measures. -
Data Preprocessing
- Dropped non-numeric columns (like patient name).
- Scaled features using
StandardScaler.
-
Model Development
- Used
RandomForestClassifierfrom scikit-learn. - Split data into 80% training and 20% testing.
- Trained and tuned hyperparameters for optimal performance.
- Used
-
Evaluation Metrics
- Accuracy
- Precision, Recall, F1-Score
- ROC-AUC Curve
- Confusion Matrix
| Metric | Score |
|---|---|
| Accuracy | 0.923 |
| Precision (PD) | 0.93 |
| Recall (PD) | 0.97 |
[[ 8 2] [ 1 28]]
| Class | Precision | Recall | F1-score |
|---|---|---|---|
| 0 (Healthy) | 0.89 | 0.80 | 0.84 |
| 1 (Parkinson’s) | 0.93 | 0.97 | 0.95 |
- Voice-based biomarkers are a non-invasive and low-cost diagnostic tool.
- Random Forest outperformed linear models in accuracy and robustness.
- The model demonstrates potential for integration into telemedicine platforms or mobile diagnostic tools.
- Clone this repository:
git clone https://github.com/YOUR_USERNAME/Parkinson-s-Disease.git cd Parkinson-s-Disease - Install dependencies:
pip install pandas numpy scikit-learn seaborn matplotlib
- Run the project:
python parkinsons_diagnosis.py
- Experiment with Deep Learning models (LSTM, CNN) for audio feature extraction.
- Integrate the model with Streamlit for a web-based diagnostic tool.
- Perform feature importance analysis for interpretability.
- Little, Max A., et al. "Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection." BioMedical Engineering OnLine, 2007.
- UCI Parkinson’s Dataset
- Scikit-learn Documentation
- Aakriti Jain, Ujjawal Gaur
- B.Tech in Artificial Intelligence and Data Science
- GGSIPU, 2026
If you found this project interesting, give it a star on GitHub! 🌟