130 lines
4.4 KiB
Markdown
130 lines
4.4 KiB
Markdown
# Supervised Learning - TV-Show Recommender
|
|
|
|
## Table of Contents
|
|
1. [How to Run the Program](#how-to-run-the-program)
|
|
2. [Project Overview](#project-overview)
|
|
3. [Dataset](#dataset)
|
|
4. [Model and Algorithm](#model-and-algorithm)
|
|
5. [Features](#features)
|
|
6. [Requirements](#requirements)
|
|
7. [Libraries](#libraries)
|
|
8. [Classes](#classes)
|
|
9. [References](#references)
|
|
|
|
## How to Run the Program
|
|
|
|
### Prerequisites
|
|
|
|
1. **Download and Extract the Dataset:**
|
|
- Download the dataset from [TMDB TV Dataset](https://www.kaggle.com/datasets/asaniczka/full-tmdb-tv-shows-dataset-2023-150k-shows).
|
|
- Extract `TMDB_tv_dataset_v3.zip` into the `dataset/` folder, so it contains the file `TMDB_tv_dataset_v3.csv`.
|
|
|
|
2. **Install Dependencies:**
|
|
- Install the necessary libraries listed in `requirements.txt` (see below).
|
|
|
|
3. **Run the Program:**
|
|
- Start the program by running the following command:
|
|
|
|
```bash
|
|
python main.py
|
|
```
|
|
|
|
- The program will load the dataset, ask for a TV show title to base recommendations on, and prompt for the number of recommendations.
|
|
|
|
- **Note:** The first time the program is run, it will generate **Sentence-BERT embeddings**. This can take up to 5 minutes due to the large size of the dataset.
|
|
|
|
---
|
|
|
|
## Project Overview
|
|
|
|
The **TV-Show Recommender** is a machine learning-based program that suggests TV shows to users based on their preferences. The system uses **Nearest Neighbors (NN)** and **K-Nearest Neighbors (K-NN)** algorithms with **cosine distance** to recommend TV shows. Users provide a title of a TV show they like, and the system returns personalized recommendations based on similarity to other TV shows in the dataset.
|
|
|
|
---
|
|
|
|
## Dataset
|
|
|
|
The dataset used in this project is sourced from **TMDB** (The Movie Database). It contains over 150,000 TV shows and includes information such as:
|
|
|
|
- Title of TV shows
|
|
- Genres
|
|
- First/Last air date
|
|
- Vote count and average rating
|
|
- Director/Creator information
|
|
- Overview/Description
|
|
- Networks
|
|
- Spoken languages
|
|
- Number of seasons/episodes
|
|
|
|
Download the dataset from [here](https://www.kaggle.com/datasets/asaniczka/full-tmdb-tv-shows-dataset-2023-150k-shows).
|
|
|
|
---
|
|
|
|
## Model and Algorithm
|
|
|
|
The recommender system is based on **Supervised Learning** using the **NearestNeighbors** and **K-NearestNeighbors** algorithms. Here's a breakdown of the process:
|
|
|
|
1. **Data Preprocessing:**
|
|
- The TV show descriptions are vectorized using **Sentence-BERT embeddings** to create dense vector representations of each show's description.
|
|
|
|
2. **Model Training:**
|
|
- The **NearestNeighbors (NN)** algorithm is used with **cosine distance** to compute similarity between TV shows. The algorithm finds the most similar shows to a user-provided title.
|
|
|
|
3. **Recommendation Generation:**
|
|
- The model generates a list of recommended TV shows by finding the nearest neighbors of the input title using cosine similarity.
|
|
|
|
---
|
|
|
|
## Features
|
|
|
|
1. **Data Loading & Preprocessing:**
|
|
- Loads the TV show data from a CSV file and preprocesses it for model training.
|
|
|
|
2. **Model Training with K-NN:**
|
|
- Trains a K-NN model using the **NearestNeighbors** algorithm for generating recommendations.
|
|
|
|
3. **User Input for Recommendations:**
|
|
- Accepts user input for the TV show title and the number of recommendations.
|
|
|
|
4. **TV Show Recommendations:**
|
|
- Returns a list of recommended TV shows based on similarity to the input TV show.
|
|
|
|
---
|
|
|
|
## Requirements
|
|
|
|
### Data Requirements:
|
|
The dataset should contain the following columns for each TV show:
|
|
- **Title**
|
|
- **Genres**
|
|
- **First/Last air date**
|
|
- **Vote count/average**
|
|
- **Director**
|
|
- **Overview**
|
|
- **Networks**
|
|
- **Spoken languages**
|
|
- **Number of seasons/episodes**
|
|
|
|
### User Input Requirements:
|
|
- **TV Show Title**: The name of the TV show you like.
|
|
- **Number of Recommendations**: The number of recommendations you want to receive (default is 10).
|
|
|
|
---
|
|
|
|
## Libraries
|
|
|
|
The following libraries are required to run the program:
|
|
|
|
- **pandas**: For data manipulation and analysis.
|
|
- **scikit-learn**: For machine learning algorithms and preprocessing.
|
|
- **scipy**: For scientific computing (e.g., sparse matrices).
|
|
- **time**: For working with time-related functions.
|
|
- **os**: For interacting with the operating system.
|
|
- **re**: For regular expression support.
|
|
- **textwrap**: For text wrapping and formatting.
|
|
- **flask**: For creating the web interface.
|
|
|
|
To install the dependencies, run:
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|