| dataset | ||
| templates | ||
| app.py | ||
| import_data.py | ||
| main.py | ||
| readdata.py | ||
| README.md | ||
| recommendations.py | ||
| training.py | ||
| user.py | ||
Supervised Learning - TV-Show Recommender
Table of Contents
- How to Run the Program
- Project Overview
- Dataset
- Model and Algorithm
- Features
- Requirements
- Libraries
- Classes
- References
How to Run the Program
Prerequisites
-
Download and Extract the Dataset:
- Download the dataset from TMDB TV Dataset.
- Extract
TMDB_tv_dataset_v3.zipinto thedataset/folder, so it contains the fileTMDB_tv_dataset_v3.csv.
-
Install Dependencies:
- Install the necessary libraries listed in
requirements.txt(see below).
- Install the necessary libraries listed in
-
Run the Program:
-
Start the program by running the following command:
python main.py -
The program will load the dataset, ask for a TV show title to base recommendations on, and prompt for the number of recommendations.
-
Note: The first time the program is run, it will generate Sentence-BERT embeddings. This can take up to 5 minutes due to the large size of the dataset.
-
Project Overview
The TV-Show Recommender is a machine learning-based program that suggests TV shows to users based on their preferences. The system uses Nearest Neighbors (NN) and K-Nearest Neighbors (K-NN) algorithms with cosine distance to recommend TV shows. Users provide a title of a TV show they like, and the system returns personalized recommendations based on similarity to other TV shows in the dataset.
Dataset
The dataset used in this project is sourced from TMDB (The Movie Database). It contains over 150,000 TV shows and includes information such as:
- Title of TV shows
- Genres
- First/Last air date
- Vote count and average rating
- Director/Creator information
- Overview/Description
- Networks
- Spoken languages
- Number of seasons/episodes
Download the dataset from here.
Model and Algorithm
The recommender system is based on Supervised Learning using the NearestNeighbors and K-NearestNeighbors algorithms. Here's a breakdown of the process:
-
Data Preprocessing:
- The TV show descriptions are vectorized using Sentence-BERT embeddings to create dense vector representations of each show's description.
-
Model Training:
- The NearestNeighbors (NN) algorithm is used with cosine distance to compute similarity between TV shows. The algorithm finds the most similar shows to a user-provided title.
-
Recommendation Generation:
- The model generates a list of recommended TV shows by finding the nearest neighbors of the input title using cosine similarity.
Features
-
Data Loading & Preprocessing:
- Loads the TV show data from a CSV file and preprocesses it for model training.
-
Model Training with K-NN:
- Trains a K-NN model using the NearestNeighbors algorithm for generating recommendations.
-
User Input for Recommendations:
- Accepts user input for the TV show title and the number of recommendations.
-
TV Show Recommendations:
- Returns a list of recommended TV shows based on similarity to the input TV show.
Requirements
Data Requirements:
The dataset should contain the following columns for each TV show:
- Title
- Genres
- First/Last air date
- Vote count/average
- Director
- Overview
- Networks
- Spoken languages
- Number of seasons/episodes
User Input Requirements:
- TV Show Title: The name of the TV show you like.
- Number of Recommendations: The number of recommendations you want to receive (default is 10).
Libraries
The following libraries are required to run the program:
- pandas: For data manipulation and analysis.
- scikit-learn: For machine learning algorithms and preprocessing.
- scipy: For scientific computing (e.g., sparse matrices).
- time: For working with time-related functions.
- os: For interacting with the operating system.
- re: For regular expression support.
- textwrap: For text wrapping and formatting.
- flask: For creating the web interface.
To install the dependencies, run:
pip install -r requirements.txt