Projekt för Pythonkurs (AI)

Go to file

jwradhe 46715bff45 Big update and changing to flask webgui instead		2024-11-21 00:52:45 +01:00
dataset	Big update and changing to flask webgui instead	2024-11-21 00:52:45 +01:00
templates	Big update and changing to flask webgui instead	2024-11-21 00:52:45 +01:00
app.py	Big update and changing to flask webgui instead	2024-11-21 00:52:45 +01:00
import_data.py	Update	2024-11-12 21:24:09 +01:00
main.py	Big update and changing to flask webgui instead	2024-11-21 00:52:45 +01:00
readdata.py	Big update and changing to flask webgui instead	2024-11-21 00:52:45 +01:00
README.md	Big update and changing to flask webgui instead	2024-11-21 00:52:45 +01:00
recommendations.py	Big update and changing to flask webgui instead	2024-11-21 00:52:45 +01:00
training.py	Big update and changing to flask webgui instead	2024-11-21 00:52:45 +01:00
user.py	Big update and changing to flask webgui instead	2024-11-21 00:52:45 +01:00

README.md

Supervised Learning - TV-Show Recommender

How to Run the Program
Project Overview
Dataset
Model and Algorithm
Features
Requirements
Libraries
Classes
References

How to Run the Program

Prerequisites

Download and Extract the Dataset:
- Download the dataset from TMDB TV Dataset.
- Extract TMDB_tv_dataset_v3.zip into the dataset/ folder, so it contains the file TMDB_tv_dataset_v3.csv.
Install Dependencies:
- Install the necessary libraries listed in requirements.txt (see below).
Run the Program:
- Start the program by running the following command:
```
python main.py
```
- The program will load the dataset, ask for a TV show title to base recommendations on, and prompt for the number of recommendations.
- Note: The first time the program is run, it will generate Sentence-BERT embeddings. This can take up to 5 minutes due to the large size of the dataset.

Project Overview

The TV-Show Recommender is a machine learning-based program that suggests TV shows to users based on their preferences. The system uses Nearest Neighbors (NN) and K-Nearest Neighbors (K-NN) algorithms with cosine distance to recommend TV shows. Users provide a title of a TV show they like, and the system returns personalized recommendations based on similarity to other TV shows in the dataset.

Dataset

The dataset used in this project is sourced from TMDB (The Movie Database). It contains over 150,000 TV shows and includes information such as:

Title of TV shows
Genres
First/Last air date
Vote count and average rating
Director/Creator information
Overview/Description
Networks
Spoken languages
Number of seasons/episodes

Download the dataset from here.

Model and Algorithm

The recommender system is based on Supervised Learning using the NearestNeighbors and K-NearestNeighbors algorithms. Here's a breakdown of the process:

Data Preprocessing:
- The TV show descriptions are vectorized using Sentence-BERT embeddings to create dense vector representations of each show's description.
Model Training:
- The NearestNeighbors (NN) algorithm is used with cosine distance to compute similarity between TV shows. The algorithm finds the most similar shows to a user-provided title.
Recommendation Generation:
- The model generates a list of recommended TV shows by finding the nearest neighbors of the input title using cosine similarity.

Features

Data Loading & Preprocessing:
- Loads the TV show data from a CSV file and preprocesses it for model training.
Model Training with K-NN:
- Trains a K-NN model using the NearestNeighbors algorithm for generating recommendations.
User Input for Recommendations:
- Accepts user input for the TV show title and the number of recommendations.
TV Show Recommendations:
- Returns a list of recommended TV shows based on similarity to the input TV show.

Requirements

Data Requirements:

The dataset should contain the following columns for each TV show:

Title
Genres
First/Last air date
Vote count/average
Director
Overview
Networks
Spoken languages
Number of seasons/episodes

User Input Requirements:

TV Show Title: The name of the TV show you like.
Number of Recommendations: The number of recommendations you want to receive (default is 10).

Libraries

The following libraries are required to run the program:

pandas: For data manipulation and analysis.
scikit-learn: For machine learning algorithms and preprocessing.
scipy: For scientific computing (e.g., sparse matrices).
time: For working with time-related functions.
os: For interacting with the operating system.
re: For regular expression support.
textwrap: For text wrapping and formatting.
flask: For creating the web interface.

To install the dependencies, run:

pip install -r requirements.txt