TV-Show-recommender/README.md

93 lines
2.9 KiB
Markdown

# Supervised Learning - TV-Show recommender
## Specification
TV-Show recommender
This program will recommend you what tv-show to view based on what you like.
You will tell what tv-show you like and how many recommendations wanted, then you will get that
amount of recommendations of tv-shows in order of rank from your search.
### Data Source:
I will use a dataset from TMBD
https://www.kaggle.com/datasets/asaniczka/full-tmdb-tv-shows-dataset-2023-150k-shows
### Model:
I will use NearestNeighbors (NN) alhorithm together with K-NearestNeighbors alhorithm.
### Features:
1. Load data from dataset and preprocessing.
2. Model training with NN & k-NN algorithm.
3. User input
4. Recommendations
### Requirements:
1. Title data:
* Title
* Genres
* First/last air date
* Vote count/average
* Director
* Description
* Networks
* Spoken languages
* Number of seasons/episodes
2. User data:
* What Movie / TV-Show prefers
* Number of recommendations wanted
### Libraries
* pandas: Data manipulation and analysis
* scikit-learn: machine learning algorithms and preprocessing
* scipy: A scientific computing package for Python
* time: provides various functions for working with time
* os: functions for interacting with the operating system
* re: provides regular expression support
* textwrap: Text wrapping and filling
### Classes
1. LoadData
* load_data
* read_data
* clean_data
2. ImportData
* load_dataset
* create_data
* clean_data
* save_data
3. TrainModel
* train
* recommend
* preprocess_title_data
* preprocess_target_data
4. UserData
* input
* n_recommendations
5. RecommendationLoader
* run
* get_recommendations
* display_recommendations
* get_explanation
* check_genre_overlap
* check_created_by_overlap
* extract_years
* filter_genres
### References
* https://scikit-learn.org/dev/modules/generated/sklearn.neighbors.NearestNeighbors.html
* https://scikit-learn.org/1.5/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
* https://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.StandardScaler.html
* https://scikit-learn.org/0.16/modules/generated/sklearn.decomposition.TruncatedSVD.html
* https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html
* https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html
* https://maartengr.github.io/BERTopic/getting_started/embeddings/embeddings.html
## How to run program
### Before running program
First thing to do is to extract TMDB_tv_dataset_v3.zip in dataset folder so that it contains TMDB_tv_dataset_v3.csv.
### Running program
Start main.py and it will load dataset and ask for a title to get recommendations from, also how many recommendations wanted. Then enter and you will have those recommendations presented on screen.