Building a recommendation system with three lines of Turicreate code

Keywords: encoding

Apple open source machine learning framework Turi Create It simplifies the development of self defined machine learning model. It can easily meet the needs of image recognition, clustering analysis and recommendation system. This paper introduces an example of rapid construction of movie recommendation system by using turicrete.

I remember when I participated in the Teddy cup of 18 years, I wrote 800 lines of code to realize the recommendation system. But now it seems that with TuriCreate, the item based CF can be easily completed in three lines of code.

Note: data sets used in this article movieLens 1M.

Step 0. Import data:

import os
import pandas as pd
os.chdir('/Users/zhbink/Desktop/ml-100k/')

data_cols = ['user_id','movie_id','rating','timestamp']
ratings = pd.read_csv('u.data', sep='\t', names=data_cols, encoding='latin-1')

Here, user ID is the number of each user participating in the rating of the movie, movie ID is the number of the movie evaluated by the user, and rating is the rating of the movie by the user.

Step 1. Build a recommendation system with turicrete

To build a collaborative filtering model, it can be roughly divided into three steps:

  1. The data is divided into training set and test set.
  2. The collaborative filtering model is constructed and the result of training set recommendation is obtained.
  3. Compare the recommended results with the test set for scoring.

With Turicreate, you can do all three of these steps in just one line of code.

1. Partition training set and test set
For this data set, it is difficult to divide training set and test set manually. Because it is not possible to divide the whole population randomly (otherwise, it may cause that all the data of a certain user are in the training set or in the test set), each user should be divided.
You can do this easily with turicreate. Recommender. Util. Random? Split? By? User.

import turicreate as tc
train_data = tc.SFrame(ratings)  # Data needs to be converted to SFrame format
train, test = tc.recommender.util.random_split_by_user(train_data, user_id='user_id', item_id='movie_id')

2. Build collaborative filtering model and get the recommended result of training set
Item "content" recommender.create can create a content-based recommender model.
item_content_recommender.ItemContentRecommender
Recommendation based on similarity between project contents

# training the model
item_sim_model = tc.item_similarity_recommender.create(train, user_id='user_id', item_id='movie_id', target='rating', similarity_type='cosine')

# making recommendations
item_sim_recomm = item_sim_model.recommend(users=[1,2,3,4,5],k=5)
item_sim_recomm.print_rows(num_rows=25)

Among them,

  • train_data: SFrame contains the training data we need
  • User ID: this column contains the ID of each user
  • Item "ID: this column contains each movie (movie ID) to be recommended
  • target: this column contains the rating or rating given by the user

Recommended results:

3. Compare the recommended results with the test set for scoring

item_sim_model.evaluate_precision_recall(test)

Result:

{'precision_recall_by_user': Columns:
 	user_id	int
 	cutoff	int
 	precision	float
 	recall	float
 	count	int
 
 Rows: 16956
 
 Data:
 +---------+--------+-----------+--------+-------+
 | user_id | cutoff | precision | recall | count |
 +---------+--------+-----------+--------+-------+
 |   196   |   1    |    0.0    |  0.0   |   7   |
 |   196   |   2    |    0.0    |  0.0   |   7   |
 |   196   |   3    |    0.0    |  0.0   |   7   |
 |   196   |   4    |    0.0    |  0.0   |   7   |
 |   196   |   5    |    0.0    |  0.0   |   7   |
 |   196   |   6    |    0.0    |  0.0   |   7   |
 |   196   |   7    |    0.0    |  0.0   |   7   |
 |   196   |   8    |    0.0    |  0.0   |   7   |
 |   196   |   9    |    0.0    |  0.0   |   7   |
 |   196   |   10   |    0.0    |  0.0   |   7   |
 +---------+--------+-----------+--------+-------+
 [16956 rows x 5 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
 'precision_recall_overall': Columns:
 	cutoff	int
 	precision	float
 	recall	float
 
 Rows: 18
 
 Data:
 +--------+---------------------+----------------------+
 | cutoff |      precision      |        recall        |
 +--------+---------------------+----------------------+
 |   1    | 0.45222929936305734 | 0.037110123899123465 |
 |   2    | 0.40923566878980894 | 0.060494764740282043 |
 |   3    |  0.3803963198867654 |  0.0820137151142372  |
 |   4    |  0.357484076433121  | 0.10122765583444787  |
 |   5    | 0.34288747346072185 |  0.1185950884547385  |
 |   6    |  0.3315640481245577 | 0.13486665370299455  |
 |   7    | 0.31922960266909306 |  0.1493753718327443  |
 |   8    | 0.30506900212314225 | 0.16196475155837983  |
 |   9    |  0.2945270110875208 | 0.17385805243866798  |
 |   10   |  0.2845010615711252 | 0.18366774630733118  |
 +--------+---------------------+----------------------+
 [18 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}

Step 4. Build a model of popular movies
In addition, it's convenient for Turicreate to build a model to recommend the most popular movies to users (that is, all users will receive the same recommendation). We can use the popularity ﹣ recommender recommendation function in turicrete to achieve this.

# training model
popularity_model = turicreate.popularity_recommender.create(train_data, user_id='user_id', item_id='movie_id', target='rating')

# making recommendations
popularity_recomm = popularity_model.recommend(users=[1,2,3,4,5],k=5)
popularity_recomm.print_rows(num_rows=25)

Recommended results:

All users' recommendations here are the same - 1201, 1122 That is to say, all the users who watch the movie give the highest score.

658 original articles published, 565 praised, 310000 visitors+
His message board follow

Posted by scuff on Sun, 12 Jan 2020 04:00:02 -0800