Skip to content

lravelb/hospital-reviews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Patient Sentiment Analysis Using NLP & LLMs

This project analyzes open-ended patient reviews to predict sentiment (positive or negative) using Natural Language Processing (NLP) techniques and Large Language Models (LLMs).

πŸš€ Project Overview

  • Goal: Classify patient feedback based on sentiment
  • Dataset: 996 hospital reviews with labeled sentiment
  • Techniques used:
    • Text cleaning and preprocessing (NLTK, regex)
    • Exploratory Data Analysis (word clouds, word frequencies, review length)
    • Feature extraction with TF-IDF
    • Sentiment classification using:
      • Logistic Regression (baseline)
      • distilBERT LLM from Hugging Face (zero-shot)

πŸ—‚οΈ Project Structure


patient-sentiment-healthcare/
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ dataset_hospital_reviews.csv #raw
β”‚ └── dataset_hospital_reviews_cleaned.csv #processed
β”œβ”€β”€ notebooks/
β”‚ β”œβ”€β”€ 01_data_cleaning_and_eda.ipynb # Data cleaning + EDA
β”‚ └── 02_modeling_and_llm_comparison.ipynb # Model training + LLM comparison
β”œβ”€β”€ README.md

πŸ“Š Results Summary

Logistic Regression (TF-IDF)

  • Accuracy: 0.86
  • High precision on positive class
  • Poor recall on negative class

distilBERT (LLM)

  • Accuracy: 0.78
  • Much better at identifying negative reviews
  • Balanced recall across classes

πŸ§ͺ Example Review

"Wait hour despite appointment isn’t first time happened understanding manage appointment queue it’s random unorganised lot scope improve"

--> Detected as NEGATIVE by distilBERT

πŸ› οΈ Tech Stack

  • Python, Pandas, Scikit-learn, NLTK, Matplotlib, Seaborn
  • Hugging Face Transformers (distilBERT)
  • Google Colab (for LLM execution)

πŸ“ How to Run

  1. Open 02_modeling_and_llm_comparison.ipynb in Google Colab
  2. Mount your Google Drive and upload the cleaned dataset (or use the one provided)
  3. Run the cells to explore, train, and evaluate both models

About

NLP project for analyzing patient satisfaction reviews using traditional machine learning and large language models (LLMs). Compares logistic regression with distilBERT on sentiment classification.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors