Text mining project
Analyzed data from the defunct Yahoo Answers. The goal of this project was to try to predict the best answers to questions in the test set from similar quesitons in the training data using term frequenct-inverse document frequency (TF-IDF).
Used multinomial naive bayes and rocchio algorithms for finding similar questions.
Dataset used can be found here: https://www.kaggle.com/datasets/yacharki/yahoo-answers-10-categories-for-nlp-csv/data