Merge pull request #2 from jwradhe/v1

V1
2024-11-21 01:14:43 +01:00 · 2024-11-21 01:14:43 +01:00 · 47725aa4b4
commit 47725aa4b4
parent 0e8a994162 d9464eaa7d
16 changed files with 821 additions and 243103 deletions
--- a/README.md
+++ b/README.md
@ -1,56 +1,146 @@
-# Supervised Learning - Movie/TV-Show recommender
+# Supervised Learning - TV-Show Recommender

-## Specification
-Movie/TV-Show recommender
+## Table of Contents
+1. [How to Run the Program](#how-to-run-the-program)
+2. [Project Overview](#project-overview)
+3. [Dataset](#dataset)
+4. [Model and Algorithm](#model-and-algorithm)
+5. [Features](#features)
+6. [Requirements](#requirements)
+7. [Libraries](#libraries)
+8. [Classes](#classes)
+9. [References](#references)

-This program will recommend you what movie or th-show to view based on what Movie/TV-Show you like.
-You should be able to search for recommendations from your Movie/TV-Show title, cast, director, 
-release year and also Description, and get back a recommendations with a explanation on what just this 
-title might suit you. It will get you about 25 recommendations of movies in order of rank from your search, 
-and the same from TV-Shows that will match the same way as Movies.
+## How to Run the Program

-### Data Source:
-I will use 4 datasets from kaggle, 3 datasets from streaming-sites Netflix, 
-Amazon Prime and Disney Plus, also 1 from a IMDB dataset.
+### Prerequisites

-### Model:
-I will use NearestNeighbors (NN) alhorithm that can help me find other titles based on features 
-like Title, Release year, Description, Cast, Director and genres.
+1. **Download and Extract the Dataset:**
+   - Download the dataset from [TMDB TV Dataset](https://www.kaggle.com/datasets/asaniczka/full-tmdb-tv-shows-dataset-2023-150k-shows).
+   - Extract `TMDB_tv_dataset_v3.zip` into the `dataset/` folder, so it contains the file `TMDB_tv_dataset_v3.csv`.

-### Features:
-1.  Load data from several data-files and preprocessing.
-2.  Model training with k-NN algorithm.
-3.  Search with explanation
+2. **Install Dependencies:**
+   - Install the necessary libraries listed in `requirements.txt` (see below).

-### Requirements:
-1. Title data:
-    * Title
-    * Genres
-    * Release year
-    * Cast
-    * Director
-    * Description
-3. User data:
-    * What Movie / TV-Show
-    * What genre
-    * Director
+### Running the Program

-### Libraries
-  * pandas: Data manipulation and analysis
-  * scikit-learn: machine learning algorithms and preprocessing
-  * beatifulsoup4: web scraping (if necessary)
+There are two ways to run the program, depending on whether you prefer to use the web-based interface or the command-line interface (CLI).

-### Classes
-  1. LoadData
-     * load_data
-     * clean_text
-     * clean_data
-     * load_dataset
-     * create_data
-     * save_data
-     * load_data
-  2. UserData
-     * input
-  3. Recommendations
-     * get_recommendations 
+#### Web Interface (Flask)

+To run the web-based interface (Flask application):
+
+```bash
+python app.py
+```
+   
+- This will start a local web server, and you can access the app through your browser (usually at http://127.0.0.1:5000/).
+- The program will load the dataset, prompt you to enter a TV show title, and ask how many recommendations you want.
+   
+#### Command-Line Interface (Python-GUI)
+
+To run the command-line version of the program:
+   
+```bash
+python main.py
+```
+
+- The program will work in the terminal, asking you to enter the title of a TV show you like and how many recommendations you want.
+
+> [!NOTE]
+>  The first time the program is run, it will generate **Sentence-BERT embeddings**. This can take up to 5 minutes due to the large size of the dataset.
+
+---
+
+## Project Overview
+
+The **TV-Show Recommender** is a machine learning-based program that suggests TV shows to users based on their preferences. The system uses **Nearest Neighbors (NN)** and **K-Nearest Neighbors (K-NN)** algorithms with **cosine distance** to recommend TV shows. Users provide a title of a TV show they like, and the system returns recommendations based on similarity to other TV shows in the dataset.
+
+---
+
+## Dataset
+
+The dataset used in this project is sourced from **TMDB** (The Movie Database). It contains over 150,000 TV shows and includes information such as:
+
+- Title of TV shows
+- Genres
+- First/Last air date
+- Vote count and average rating
+- Director/Creator information
+- Overview/Description
+- Networks
+- Spoken languages
+- Number of seasons/episodes
+
+Download the dataset from [here](https://www.kaggle.com/datasets/asaniczka/full-tmdb-tv-shows-dataset-2023-150k-shows).
+
+---
+
+## Model and Algorithm
+
+The recommender system is based on **Supervised Learning** using the **NearestNeighbors** and **K-NearestNeighbors** algorithms. Here's a breakdown of the process:
+
+1. **Data Preprocessing:** 
+   - The TV show descriptions are vectorized using **Sentence-BERT embeddings** to create dense vector representations of each show's description.
+   
+2. **Model Training:**
+   - The **NearestNeighbors (NN)** algorithm is used with **cosine distance** to compute similarity between TV shows. The algorithm finds the most similar shows to a user-provided title.
+   
+3. **Recommendation Generation:**
+   - The model generates a list of recommended TV shows by finding the nearest neighbors of the input title using cosine similarity.
+
+---
+
+## Features
+
+1. **Data Loading & Preprocessing:** 
+   - Loads the TV show data from a CSV file and preprocesses it for model training.
+
+2. **Model Training with K-NN:**
+   - Trains a K-NN model using the **NearestNeighbors** algorithm for generating recommendations.
+
+3. **User Input for Recommendations:**
+   - Accepts user input for the TV show title and the number of recommendations.
+
+4. **TV Show Recommendations:**
+   - Returns a list of recommended TV shows based on similarity to the input TV show.
+
+---
+
+## Requirements
+
+### Data Requirements:
+The dataset should contain the following columns for each TV show:
+- **Title**
+- **Genres**
+- **First/Last air date**
+- **Vote count/average**
+- **Director**
+- **Overview**
+- **Networks**
+- **Spoken languages**
+- **Number of seasons/episodes**
+
+### User Input Requirements:
+- **TV Show Title**: The name of the TV show you like.
+- **Number of Recommendations**: The number of recommendations you want to receive (default is 10).
+
+---
+
+## Libraries
+
+The following libraries are required to run the program:
+
+- **pandas**: For data manipulation and analysis.
+- **scikit-learn**: For machine learning algorithms and preprocessing.
+- **scipy**: For scientific computing (e.g., sparse matrices).
+- **time**: For working with time-related functions.
+- **os**: For interacting with the operating system.
+- **re**: For regular expression support.
+- **textwrap**: For text wrapping and formatting.
+- **flask**: For creating the web interface.
+
+To install the dependencies, run:
+
+```bash
+pip install -r requirements.txt
--- a/app.py
+++ b/app.py
@ -0,0 +1,84 @@
+from flask import Flask, render_template, request
+from readdata import LoadData
+from recommendations import RecommendationLoader
+from training import TrainModel
+
+app = Flask(__name__)
+
+data_loader = LoadData()
+title_data = data_loader.load_data()
+
+model = TrainModel(title_data)
+model.train()
+
+recommender = RecommendationLoader(model, title_data)
+
+
+@app.route('/')
+def home():
+    return render_template('index.html')
+
+
+@app.route('/recommend', methods=['POST'])
+def recommend():
+
+    # Get user input
+    title = request.form.get('title').strip()
+    n_recommendations = int(request.form.get('n_recommendations', 10))
+
+    # Validate user input
+    if not title:
+        return render_template('index.html', message="Please enter a valid TV show title.")
+    
+    try:
+        n_recommendations = int(n_recommendations)
+        if n_recommendations < 1 or n_recommendations > 50:
+            raise ValueError("Number of recommendations must be between 1 and 50.")
+    except ValueError as e:
+        return render_template('index.html', message=str(e))
+
+    # Get recommendations from the model
+    target_row = title_data[title_data['name'].str.lower() == title.lower()]
+    
+    # Check if a match was found
+    if target_row.empty:
+        return render_template('index.html', message=f"No match found for '{title}'. Try again.")
+
+    # Get recommendations
+    target_row = target_row.iloc[0]
+    user_data = {'title': title, 'n_rec': n_recommendations}
+    recommendations = recommender.get_recommendations("flask", target_row, user_data)
+    
+    # Check if recommendations were found
+    if recommendations is None or recommendations.empty:
+        return render_template('index.html', message=f"Sorry, no recommendations available for {title}.")
+    
+    # Prepare data for display on the webpage
+    recommendations_data = []
+
+    for _, row in recommendations.iterrows():
+
+        # Extract the first and last air dates
+        first_air_date = recommender.extract_years(row['first_air_date'])
+        last_air_date = recommender.extract_years(row['last_air_date'])
+        if last_air_date != "Ongoing" and last_air_date:
+            years = f"{first_air_date} - {last_air_date}"
+        else:
+            years = f"{first_air_date}"
+
+        recommendations_data.append({
+            'title': row['name'],
+            'genres': ', '.join(row['genres']) if isinstance(row['genres'], list) else row['genres'],
+            'overview': row['overview'],
+            'rating': row['vote_average'],
+            'seasons': row['number_of_seasons'],
+            'episodes': row['number_of_episodes'],
+            'networks': ', '.join(row['networks']) if isinstance(row['networks'], list) and row['networks'] else 'N/A',
+            'years': years,
+        })
+
+    return render_template('index.html', recommendations=recommendations_data, original_title=title)
+
+
+if __name__ == '__main__':
+    app.run(debug=True)
--- a/dataset/TMDB_tv_dataset_v3.zip
+++ b/dataset/TMDB_tv_dataset_v3.zip
--- a/dataset/data_amazon.csv
+++ b/dataset/data_amazon.csv
--- a/dataset/data_disney.csv
+++ b/dataset/data_disney.csv
--- a/dataset/data_imdb.csv
+++ b/dataset/data_imdb.csv
--- a/dataset/data_netflix.csv
+++ b/dataset/data_netflix.csv
--- a/dataset/dataset_tmdb.csv
+++ b/dataset/dataset_tmdb.csv
--- a/import_data.py
+++ b/import_data.py
@ -0,0 +1,76 @@
+import re
+import os
+import pandas as pd
+
+
+###############################################################
+#### Class: ImportData                                                              
+###############################################################
+class ImportData:
+
+    def __init__(self):
+        self.data = None
+        self.loaded_datasets = []
+
+
+    ###########################################################
+    #### Function: load_dataset                              
+    ###########################################################
+    def load_dataset(self, dataset_path):
+        # Load data from dataset CSV file
+        try:
+            df = pd.read_csv(os.path.join(f'dataset', dataset_path))
+            return df
+        except FileNotFoundError:
+            print(f'Warning: "{dataset_path}" not found. Skipping this dataset.')
+            return None
+
+
+    ###########################################################
+    #### Function: create_data                              
+    ###########################################################
+    def create_data(self, filename):
+        try:
+            self.data = self.load_dataset(filename)
+            print(f'Imported data successfully.')
+        except FileNotFoundError:
+            print("No data imported, missing dataset")
+            return None
+
+
+    ###########################################################
+    #### Function: clean_data                               
+    ###########################################################
+    def clean_data(self):
+        if self.data is not None:
+            # Drop unnecessary columns
+            df_cleaned = self.data.drop(columns=['adult', 'poster_path', 'production_companies', 
+            'in_production','backdrop_path','production_countries','status','episode_run_time',
+            'original_name', 'popularity', 'tagline','homepage'], errors='ignore')
+            
+            # Clean text from non-ASCII characters
+            text_columns = ['name', 'overview','spoken_languages']
+            masks = [df_cleaned[col].apply(lambda x: isinstance(x, str) and bool(re.match(r'^[\x00-\x7F]*$', x)))
+                     for col in text_columns]
+            combined_mask = pd.concat(masks, axis=1).all(axis=1)
+
+            self.data = df_cleaned[combined_mask]
+
+            print(f'Data cleaned. {self.data.shape[0]} records remaining.')
+        else:
+            print("No data to clean. Please load the dataset first.")
+
+
+    ###########################################################
+    #### Function: save_data                              
+    ###########################################################
+    def save_data(self):
+        if self.data is not None:
+            try:
+                # Sava dataframe to CSV
+                self.data.to_csv('data.csv', index=False)
+                print(f'Data saved to data.csv.')
+            except Exception as e:
+                print(f'Error saving data: {e}')
+        else:
+            print("No data to save. Please clean the data first.")
--- a/main.py
+++ b/main.py
@ -1,177 +1,23 @@
-import pandas as pd
-import re
-import os
-from sklearn.neighbors import NearestNeighbors
-from sklearn.feature_extraction.text import TfidfVectorizer
-from textwrap import dedent
-
-class LoadData:
-    def __init__(self):
-        self.data = None
-        self.loaded_datasets = []
-
-    def load_data(self):
-        self.create_data()
-        self.clean_data()
-        num_rows = self.data.shape[0]
-        print(f'{num_rows} titles loaded successfully.')
-        return self.data
-
-    def clean_text(self, text):
-        if isinstance(text, str):
-            cleaned = re.sub(r'[^\x00-\x7F]+', '', text)
-            cleaned = cleaned.replace('#', '').replace('"', '')
-            return cleaned.strip()
-        return '' 
-
-    def load_dataset(self, dataset_path, stream):
-        try:
-            df = pd.read_csv(f'dataset/{dataset_path}')
-            df['stream'] = stream
-            if stream != 'IMDB':
-                df = df.drop(columns=['show_id', 'date_added', 'duration', 'rating'], errors='ignore')
-                df = df.rename(columns={'listed_in': 'genres'})
-            else:
-                df = df.rename(columns={'releaseYear': 'release_year'})
-                df = df.drop(columns=['numVotes', 'id','avaverageRating'], errors='ignore')
-            self.loaded_datasets.append(stream)
-            return df
-        except FileNotFoundError:
-            print(f'Warning: "{dataset_path}" not found. Skipping this dataset.')
-
-    def create_data(self):
-        print(f'Starting to read data ...')
-        
-        df_netflix = self.load_dataset('data_netflix.csv','Netflix')
-        df_amazon = self.load_dataset('data_amazon.csv','Amazon')
-        df_disney = self.load_dataset('data_disney.csv','Disney')
-        df_imdb = self.load_dataset('data_imdb.csv','IMDB')
-
-        dataframes = [df for df in [df_imdb, df_netflix, df_amazon, df_disney] if df is not None]
-        if not dataframes:
-            print("Error: No datasets loaded. Cannot create combined data.")
-            return
-
-        df_all = pd.concat(dataframes, ignore_index=True, sort=False)
-        df_all = df_all.infer_objects(copy=False)
-        self.data = df_all
-
-        print(f'Data from {", ".join(self.loaded_datasets)} imported.')
-
-    def clean_data(self):
-        self.data.dropna(subset=['title', 'genres', 'description'], inplace=True)
-        string_columns = self.data.select_dtypes(include=['object'])
-        self.data[string_columns.columns] = string_columns.apply(lambda col: col.map(self.clean_text, na_action='ignore'))
-        self.data = self.data[~self.data['title'].str.strip().isin(['', ':'])]
-        self.data['genres'] = self.data['genres'].str.split(', ').apply(lambda x: [genre.strip() for genre in x])
-        self.data = self.data[self.data['genres'].map(lambda x: len(x) > 0)]
-        print(f'Data cleaned. {self.data.shape[0]} records remaining.')
+from readdata import LoadData
+from training import TrainModel
+from recommendations import RecommendationLoader


-class UserData:
-    def __init__(self):
-        self.user_data = None
-
-    def input(self):
-        self.user_data = input("Which Movie or TV Series do you prefer: ")
-        return self.user_data.strip().lower()
-
-
-class TrainModel:
-    def __init__(self, title_data):
-        self.recommendation_model = None
-        self.title_data = title_data
-        self.title_vectors = None 
-        self.vectorizer = TfidfVectorizer() 
-        self.preprocess_data()
-
-    def preprocess_data(self):
-        self.title_data['genres'] = self.title_data['genres'].apply(lambda x: ', '.join(x) if isinstance(x, list) else '')
-        self.title_data['combined_text'] = (
-            self.title_data['title'].fillna('') + ' ' +
-            self.title_data['director'].fillna('') + ' ' +
-            self.title_data['cast'].fillna('') + ' ' +
-            self.title_data['genres'] + ' ' + 
-            self.title_data['description'].fillna('')
-        )
-        self.title_data['combined_text'] = self.title_data['combined_text'].str.lower()
-        self.title_data['combined_text'] = self.title_data['combined_text'].str.replace(r'[^a-z\s]', '', regex=True)
-        self.title_vectors = self.vectorizer.fit_transform(self.title_data['combined_text'])
-
-    def preprocess_user_input(self, user_input):
-        user_vector = self.vectorizer.transform([user_input])
-        return user_vector
-
-    def train(self):
-        self.recommendation_model = NearestNeighbors(n_neighbors=10, metric='cosine')
-        self.recommendation_model.fit(self.title_vectors)
-
-
-class RecommendationLoader:
-    def __init__(self, model, title_data):
-        self.model = model
-        self.title_data = title_data
-
-    def run(self):
-        while True: 
-            user_data = UserData()
-            user_input = user_data.input()
-
-            if user_input in ['exit', 'quit']:
-                print("Program will exit now. Thanks for using!")
-                break
-
-            self.get_recommendations(user_input) 
-            print("\nWrite 'exit' or 'quit' to end the program.")
-            
-    def get_recommendations(self, user_data):
-        user_vector = self.model.preprocess_user_input(user_data)   
-        distances, indices = self.model.recommendation_model.kneighbors(user_vector, n_neighbors=10) 
-        recommendations = self.title_data.iloc[indices[0]]
-
-        self.display_recommendations(user_data, recommendations)
-
-    def display_recommendations(self, user_data, recommendations):
-        print(f'\nRecommendations based on "{user_data}":\n')
-
-        if not recommendations.empty:
-            movie_recommendations = recommendations[recommendations['type'] == 'Movie']
-            tv_show_recommendations = recommendations[recommendations['type'] == 'TV Show']
-
-            if not movie_recommendations.empty:
-                print("\n#################### Recommended Movies: ####################")
-                for i, (_, row) in enumerate(movie_recommendations.iterrows(), start=1):
-                    print(dedent(f"""
-                        {i}. {row['title']} ({row['release_year']}) ({row['genres']})
-                        Description: {row['description']}
-                        Director: {row['director']}
-                        Cast: {row['cast']}
-                        
-                        ===============================================================
-                    """))
-
-            if not tv_show_recommendations.empty:
-                print("\n#################### Recommended TV Shows: ####################")
-                for i, (_, row) in enumerate(tv_show_recommendations.iterrows(), start=1):
-                    print(dedent(f"""
-                        {i}. {row['title']} ({row['release_year']}) ({row['genres']})
-                        Description: {row['description']}
-                        Director: {row['director']}
-                        Cast: {row['cast']}
-                        
-                        ===============================================================
-                    """))
-        else:
-            print("No recommendations found.")
-
+#########################################################################
+#### function: main                                               
+#########################################################################

 def main():
+    
+    # Load data from CSV file
    data_loader = LoadData()
    title_data = data_loader.load_data()

+    # Train model
    model = TrainModel(title_data)
    model.train()

+    # Run recommendation loader
    recommendations = RecommendationLoader(model, title_data)
    recommendations.run()

--- a/readdata.py
+++ b/readdata.py
@ -0,0 +1,79 @@
+import pandas as pd
+from import_data import ImportData
+
+
+#########################################################################
+#### Class: LoadData                                                
+#########################################################################
+class LoadData:
+    def __init__(self):
+        self.data = None
+        self.filename = 'TMDB_tv_dataset_v3.csv'
+
+
+    ###########################################################
+    #### Function: load_data                               
+    ###########################################################
+    def load_data(self):
+        self.read_data()
+        self.clean_data()
+        print(f'{self.data.shape[0]} titles loaded successfully.')
+        return self.data
+
+
+    ###########################################################
+    #### Function: read_data                             
+    ###########################################################
+    def read_data(self):
+        print("Starting to read data ...")
+        try:
+            # Try to Read CSV file
+            self.data = pd.read_csv('data.csv')
+            print(f'{self.data.shape[0]} rows read successfully.')
+        except FileNotFoundError:
+            print("No data.csv file found. Attempting to import data...")
+            # If CSV file not found, try to import data from datasets instead
+            try:
+                data_importer = ImportData()
+                data_importer.create_data(self.filename)
+                data_importer.clean_data()
+                data_importer.save_data()
+                self.data = pd.read_csv('data.csv')
+                print(f'{self.data.shape[0]} rows imported successfully.')
+            except Exception as e:
+                print(f"Error during data import process: {e}")
+
+
+    ###########################################################
+    #### Function: clean_data                               
+    ###########################################################
+    def clean_data(self):
+        # Function to split a string into a list, or use an empty list if no valid data
+        def split_to_list(value):
+            if isinstance(value, str):
+                # Strip and split the string, and remove any empty items
+                return [item.strip() for item in value.split(',') if item.strip()] 
+            return []
+        
+        data_start = self.data.shape[0]
+        
+        # Split genres, spoken_languages, networks, and created_by
+        self.data['genres'] = self.data['genres'].apply(split_to_list)
+        self.data['spoken_languages'] = self.data['spoken_languages'].apply(split_to_list)
+        self.data['networks'] = self.data['networks'].apply(split_to_list)
+        self.data['created_by'] = self.data['created_by'].apply(split_to_list)
+
+        # Drop rows that are not in English
+        self.data = self.data[self.data['original_language'] == 'en']
+
+        # Drop rows with empty lists in genres or spoken_languages
+        self.data = self.data[
+            self.data['genres'].map(lambda x: len(x) > 0) &
+            self.data['spoken_languages'].map(lambda x: len(x) > 0) &
+            self.data['networks'].map(lambda x: len(x) > 0) 
+        ]
+
+        # Count rows that were dropped
+        rows_dropped = data_start - len(self.data)
+
+        print('Data cleaned successfully, dropped ' + str(rows_dropped) + ' rows.')
--- a/recommendations.py
+++ b/recommendations.py
@ -0,0 +1,155 @@
+from user import UserData
+import pandas as pd
+import textwrap
+
+
+###############################################################
+#### Class: RecommendationLoader                                
+###############################################################
+class RecommendationLoader:
+    def __init__(self, model, title_data):
+        self.model = model
+        self.title_data = title_data
+
+
+    ###########################################################
+    #### Function: run                                 
+    ###########################################################
+    def run(self):
+        while True: 
+            user_data = UserData()
+            user_data.title() 
+            user_data.n_recommendations() 
+
+            # Exit the program if writing exit or quit.
+            if user_data.user_data['title'] in ['exit', 'quit']:
+                print("Program will exit now. Thanks for using!")
+                break
+            
+            # Find a row in dataset to use as referens.
+            target_row = self.title_data[self.title_data['name'].str.lower() == user_data.user_data['title']]
+
+            # If no match found, loop and try again.
+            if target_row.empty:
+                print(f"No match found for '{user_data.user_data['title']}'. Try again.")
+                continue
+            
+            # If match found, get recommendations.
+            target_row = target_row.iloc[0]
+            self.get_recommendations(target_row, user_data.user_data)
+            print("#" * 100)
+            print("\nWrite 'exit' or 'quit' to end the program.")
+
+    
+    ###########################################################
+    #### Function: get_recommendations                                
+    ###########################################################
+    def get_recommendations(self, type, target_row, user_data):
+        recommendations = pd.DataFrame()
+        n_recommendations = user_data['n_rec']
+
+        # Get more recommendations and filter untill n_recommendations is reached
+        while len(recommendations) < n_recommendations:
+            additional_recommendations = self.model.recommend(target_row, num_recommendations=20)     
+            additional_recommendations = additional_recommendations[~additional_recommendations.index.isin(recommendations.index)] 
+            additional_recommendations = self.filter_genres(additional_recommendations, target_row)
+            recommendations = pd.concat([recommendations, additional_recommendations])
+
+        # Make sure we give n_recommendations recommendations
+        recommendations = recommendations.head(n_recommendations)    
+
+        if type == 'flask':
+            return recommendations
+        else:
+            self.display_recommendations(user_data, recommendations, n_recommendations, target_row)
+
+
+    ###########################################################
+    #### Function: display_recommendations                                
+    ###########################################################
+    def display_recommendations(self, user_data, recommendations, n_recommendations, target_row):
+        print(f'\n{n_recommendations} recommendations based on "{user_data["title"]}":\n')
+
+        # Width on printed recommendations
+        width = 100
+
+        # Print recommendations if there are any
+        if not recommendations.empty:
+            # print(f"{'Title':<40} {'Genres':<60} {'Networks':<30}")
+            print("#" * width)
+
+            for index, row in recommendations.iterrows():
+                title = row['name']
+                genres = ', '.join(row['genres']) if isinstance(row['genres'], list) else row['genres'] 
+                networks = ', '.join(row['networks']) if isinstance(row['networks'], list) and row['networks'] else 'N/A'
+                created_by = ', '.join(row['created_by']) if isinstance(row['created_by'], list) and row['created_by'] else 'N/A'
+                rating = row['vote_average']
+                vote_count = row['vote_count']
+                seasons = row['number_of_seasons'] if isinstance(row['number_of_seasons'], int) else 'N/A'
+                episodes = row['number_of_episodes'] if isinstance(row['number_of_episodes'], int) else 'N/A'
+                overview = textwrap.fill(row["overview"], width=width)
+
+                # Extract years fir first_air_date and last_air_date      
+                first_year = self.extract_years(row["first_air_date"])
+                last_year = self.extract_years(row["last_air_date"])
+
+                # Construct title with the year range
+                title_raw = f"{title} ({first_year}-{last_year})"
+                title = textwrap.fill(title_raw, width=width)
+
+                # Print recommendation
+                print(f"\nTitle:    {title}")
+                print(f"Genres:   {genres}")
+                if not created_by == 'N/A':
+                    print(f"Director: {created_by}")
+                if not networks == 'N/A':
+                    print(f'Networks: {networks}')
+                print(f"Rating:   {rating:.1f} ({vote_count:.0f} votes)")
+                if not seasons == 'N/A' and not episodes == 'N/A':
+                    print(f"Seasons:  {seasons} ({episodes} episodes)")
+                print(f'\n{overview}\n')
+
+                print("-" * width)
+
+            print("\nEnd of recommendations.")
+        else:
+            print("No recommendations found.")
+
+    ###########################################################
+    #### Function: extract_years                                 
+    ###########################################################
+    def extract_years(self, air_date):
+        # Make sure air_date is not null
+        if pd.isna(air_date):
+            return "Unknown"
+        # Convert float to int if needed
+        if isinstance(air_date, float):
+            return str(int(air_date))
+        return air_date.split('-')[0]  
+
+
+    ###########################################################
+    #### Function: filter_genres                              
+    ###########################################################
+    def filter_genres(self, recommendations, target_row):
+
+        # Get genres from the target row
+        reference_genres = [genre.lower() for genre in target_row['genres']]
+
+        # Check if the reference includes specific genres
+        is_kids_reference = 'kids' in reference_genres
+        is_animated_reference = 'animation' in reference_genres
+        is_reality_reference = 'reality' in reference_genres
+        is_documentary_reference = 'documentary' in reference_genres
+
+        # Filter recommendations based on genre preferences
+        if not is_kids_reference:
+            recommendations = recommendations[~recommendations['genres'].apply(lambda x: 'kids' in [g.lower() for g in x])]
+        if not is_animated_reference:
+            recommendations = recommendations[~recommendations['genres'].apply(lambda x: 'animation' in [g.lower() for g in x])]
+        if not is_reality_reference:
+            recommendations = recommendations[~recommendations['genres'].apply(lambda x: 'reality' in [g.lower() for g in x])]
+        if not is_documentary_reference:
+            recommendations = recommendations[~recommendations['genres'].apply(lambda x: 'documentary' in [g.lower() for g in x])]
+
+        return recommendations
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,7 @@
+Flask==3.0.1
+numpy==1.26.4
+pandas==2.2.0
+scikit-learn==1.4.1.post1
+scipy==1.12.0
+sentence-transformers==3.2.1
+textwrap3==0.9.0
--- a/templates/index.html
+++ b/templates/index.html
@ -0,0 +1,92 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Recommendation System</title>
+    <style>
+        body {
+            font-family: Arial, sans-serif;
+            margin: 0;
+            padding: 20px;
+            background-color: #f4f4f9;
+        }
+        .container {
+            max-width: 1200px;
+            margin: 0 auto;
+            padding: 20px;
+        }
+        h1 {
+            text-align: center;
+            color: #333;
+        }
+        form {
+            text-align: center;
+            margin-bottom: 20px;
+        }
+        .recommendation {
+            background: #fff;
+            padding: 15px;
+            border-radius: 8px;
+            margin-bottom: 20px;
+            box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
+        }
+        .recommendation h3 {
+            margin-top: 0;
+        }
+        .recommendation p {
+            margin: 5px 0;
+        }
+        .recommendation .overview {
+            font-size: 14px;
+            color: #555;
+        }
+        .error-message {
+            color: red;
+            text-align: center;
+            font-weight: bold;
+        }
+    </style>
+</head>
+<body>
+
+<div class="container">
+    <h1>TV-Show Recommendations</h1>
+
+    <!-- Recommendation Form -->
+    <form method="POST" action="/recommend">
+        <label for="title">Enter a Title (TV Show):</label><br><br>
+        <input type="text" id="title" name="title" required><br><br>
+
+        <label for="n_recommendations">Number of Recommendations:</label><br><br>
+        <input type="number" id="n_recommendations" name="n_recommendations" value="10" min="1" max="50"><br><br>
+
+        <input type="submit" value="Get Recommendations">
+    </form>
+
+    <!-- Display Error Message if any -->
+    {% if message %}
+    <div class="error-message">{{ message }}</div>
+    {% endif %}
+
+    <!-- Display Recommendations -->
+    {% if recommendations %}
+    <h2>Recommendations based on "{{ original_title }}":</h2>
+    <div class="recommendations">
+        {% for rec in recommendations %}
+        <div class="recommendation">
+            <h3>{{ rec.title }} ({{ rec.years }})</h3>
+            <p><strong>Genres:</strong> {{ rec.genres }}</p>
+            <p><strong>Networks:</strong> {{ rec.networks }}</p>
+            <p><strong>Rating:</strong> {{ rec.rating }}</p>
+            <p><strong>Seasons:</strong> {{ rec.seasons }}({{ rec.episodes }} episodes)</p>
+            <p class="overview"><strong>Overview:</strong> {{ rec.overview }}</p>
+        </div>
+        {% endfor %}
+    </div>
+    {% endif %}
+
+</div>
+
+</body>
+</html>
--- a/training.py
+++ b/training.py
@ -0,0 +1,146 @@
+from sentence_transformers import SentenceTransformer
+from sklearn.neighbors import NearestNeighbors
+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.preprocessing import StandardScaler
+from sklearn.decomposition import TruncatedSVD
+from scipy.sparse import hstack, csr_matrix
+import pickle
+import time
+
+import warnings
+warnings.filterwarnings("ignore", category=UserWarning, module='sklearn')
+
+
+#########################################################################
+#### Class: TrainModel
+#########################################################################
+class TrainModel:
+    def __init__(self, title_data):
+        self.title_data = title_data
+
+        # Initialize Sentence-BERT model for embeddings
+        self.bert_model = SentenceTransformer('all-MiniLM-L12-v2')
+
+        # TF-IDF Vectorization settings
+        self.vectorizer = TfidfVectorizer(max_features=5000, ngram_range=(1, 2), min_df=0.01, max_df=0.5)
+
+        # Nearest Neighbors settings
+        self.nearest_neighbors = NearestNeighbors(metric='cosine')
+
+        # Scaler for numerical features
+        self.scaler = StandardScaler()
+
+        # SVD for dimensionality reduction
+        self.svd = TruncatedSVD(n_components=300)
+
+
+    ###########################################################
+    #### Function: Train
+    ###########################################################
+    def train(self):
+        print("Starting to train model...")
+
+        start = time.time()
+
+        # Preprocess title data with advanced embeddings included
+        preprocessed_data = self.preprocess_title_data()
+
+        # Train Nearest Neighbors on the enhanced feature set
+        self.nearest_neighbors.fit(preprocessed_data)
+
+        print(f'Trained model successfully in {time.time() - start:.2f} seconds.')
+
+
+    ###########################################################
+    #### Function: Recommend
+    ###########################################################
+    def recommend(self, target_row, num_recommendations=40):
+        # Preprocess target data
+        target_vector = self.preprocess_target_data(target_row)
+
+        # Use Nearest Neighbors to get recommendations
+        distances, indices = self.nearest_neighbors.kneighbors(target_vector, n_neighbors=num_recommendations)
+        recommendations = self.title_data.iloc[indices[0]].copy()
+        recommendations['distance'] = distances[0]
+
+        # Filter recommendations
+        recommendations = recommendations[
+            (recommendations['name'].str.lower() != target_row['name'].lower()) &
+            (recommendations['distance'] < 0.5)
+        ]
+        return recommendations.head(num_recommendations)
+
+
+    ###########################################################
+    #### Function: preprocess_title_data
+    ###########################################################
+    def preprocess_title_data(self):
+        # Combine text fields for TF-IDF and BERT
+        self.title_data['combined_text'] = (
+            self.title_data['overview'].fillna('').apply(str) + ' ' +
+            self.title_data['genres'].fillna('').apply(str) + ' ' +
+            self.title_data['created_by'].fillna('').apply(str)
+        )
+
+        # TF-IDF + SVD
+        text_features = self.vectorizer.fit_transform(self.title_data['combined_text'])
+        text_features = self.svd.fit_transform(text_features)
+
+        # Sentence-BERT embeddings
+        bert_embeddings = self.load_pickle('bert_embeddings.pkl', self.title_data['combined_text'])
+
+        # Numerical features
+        self.numerical_data = self.title_data.select_dtypes(include=['number'])
+        numerical_features = self.scaler.fit_transform(self.numerical_data)
+        numerical_features_sparse = csr_matrix(numerical_features)
+
+        # Combine all features
+        combined_features = hstack([csr_matrix(text_features), csr_matrix(bert_embeddings),
+                                    numerical_features_sparse])
+        return combined_features
+
+
+    ###########################################################
+    #### Function: preprocess_target_data
+    ###########################################################
+    def preprocess_target_data(self, target_row):
+        # TF-IDF + SVD
+        target_text_vector = self.vectorizer.transform([target_row['combined_text']])
+        target_text_vector = self.svd.transform(target_text_vector)
+
+        # Sentence-BERT embedding
+        target_bert_embedding = self.embed_text(target_row['combined_text']).reshape(1, -1)
+
+        # Numerical features
+        target_numerical = target_row[self.numerical_data.columns].values.reshape(1, -1)
+        target_numerical_scaled = self.scaler.transform(target_numerical)
+
+        # Combine all features
+        target_vector = hstack([csr_matrix(target_text_vector), csr_matrix(target_bert_embedding),
+                                csr_matrix(target_numerical_scaled)])
+        return target_vector
+    
+
+    ###########################################################
+    #### Function: embed_text
+    ###########################################################
+    def embed_text(self, text):
+        # Use Sentence-BERT to create embeddings
+        return self.bert_model.encode(text, convert_to_numpy=True)
+    
+
+    ###########################################################
+    #### Function: load_pickle
+    ###########################################################
+    def load_pickle(self, filename, title_data):
+        try:
+            with open(filename, 'rb') as f:
+                bert_embeddings = pickle.load(f)
+        except FileNotFoundError:
+            print("Generating Sentence-BERT embeddings...")
+            bert_embeddings = self.bert_model.encode(title_data.tolist(), batch_size=64, convert_to_numpy=True)
+            with open(filename, 'wb') as f:
+                pickle.dump(bert_embeddings, f)
+        return bert_embeddings
+
+
--- a/user.py
+++ b/user.py
@ -0,0 +1,34 @@
+###############################################################
+#### Class: UserData                                                                
+###############################################################
+class UserData:
+    def __init__(self):
+        self.user_data = {} 
+        self.n_rec = 10
+
+    ###########################################################
+    #### Function: title                               
+    ###########################################################
+    def title(self):
+        # Ask for user input
+        print("#" * 100)
+        title = input("\nPlease enter the title of TV-Series you prefer: ")
+        self.user_data['title'] = title.strip().lower()
+        return self.user_data
+    
+    ###########################################################
+    #### Function: n_recommendations                                
+    ###########################################################
+    def n_recommendations(self):
+        # Ask for number of recommendations
+        while True:
+            n_rec = input("How many recommendations do you want (minimum 5): ")
+            try:
+                n_rec = int(n_rec.strip())
+                if n_rec < 5:
+                    print("Please enter a number greater than or equal to 5: ")
+                else:
+                    self.user_data['n_rec'] = n_rec
+                    break
+            except ValueError:
+                print("Please enter a valid number.")