Apple open source machine learning framework Turi Create It simplifies the development of self defined machine learning model. It can easily meet the needs of image recognition, clustering analysis and recommendation system. This paper introduces an example of rapid construction of movie recommendation system by using turicrete.
I remember when I participated in the Teddy cup of 18 years, I wrote 800 lines of code to realize the recommendation system. But now it seems that with TuriCreate, the item based CF can be easily completed in three lines of code.
Note: data sets used in this article movieLens 1M.
Step 0. Import data:
import os import pandas as pd os.chdir('/Users/zhbink/Desktop/ml-100k/') data_cols = ['user_id','movie_id','rating','timestamp'] ratings = pd.read_csv('u.data', sep='\t', names=data_cols, encoding='latin-1')
Here, user ID is the number of each user participating in the rating of the movie, movie ID is the number of the movie evaluated by the user, and rating is the rating of the movie by the user.
Step 1. Build a recommendation system with turicrete
To build a collaborative filtering model, it can be roughly divided into three steps:
- The data is divided into training set and test set.
- The collaborative filtering model is constructed and the result of training set recommendation is obtained.
- Compare the recommended results with the test set for scoring.
With Turicreate, you can do all three of these steps in just one line of code.
1. Partition training set and test set
For this data set, it is difficult to divide training set and test set manually. Because it is not possible to divide the whole population randomly (otherwise, it may cause that all the data of a certain user are in the training set or in the test set), each user should be divided.
You can do this easily with turicreate. Recommender. Util. Random? Split? By? User.
import turicreate as tc train_data = tc.SFrame(ratings) # Data needs to be converted to SFrame format train, test = tc.recommender.util.random_split_by_user(train_data, user_id='user_id', item_id='movie_id')
2. Build collaborative filtering model and get the recommended result of training set
Item "content" recommender.create can create a content-based recommender model.
item_content_recommender.ItemContentRecommender
Recommendation based on similarity between project contents
# training the model item_sim_model = tc.item_similarity_recommender.create(train, user_id='user_id', item_id='movie_id', target='rating', similarity_type='cosine') # making recommendations item_sim_recomm = item_sim_model.recommend(users=[1,2,3,4,5],k=5) item_sim_recomm.print_rows(num_rows=25)
Among them,
- train_data: SFrame contains the training data we need
- User ID: this column contains the ID of each user
- Item "ID: this column contains each movie (movie ID) to be recommended
- target: this column contains the rating or rating given by the user
Recommended results:
3. Compare the recommended results with the test set for scoring
item_sim_model.evaluate_precision_recall(test)
Result:
{'precision_recall_by_user': Columns: user_id int cutoff int precision float recall float count int Rows: 16956 Data: +---------+--------+-----------+--------+-------+ | user_id | cutoff | precision | recall | count | +---------+--------+-----------+--------+-------+ | 196 | 1 | 0.0 | 0.0 | 7 | | 196 | 2 | 0.0 | 0.0 | 7 | | 196 | 3 | 0.0 | 0.0 | 7 | | 196 | 4 | 0.0 | 0.0 | 7 | | 196 | 5 | 0.0 | 0.0 | 7 | | 196 | 6 | 0.0 | 0.0 | 7 | | 196 | 7 | 0.0 | 0.0 | 7 | | 196 | 8 | 0.0 | 0.0 | 7 | | 196 | 9 | 0.0 | 0.0 | 7 | | 196 | 10 | 0.0 | 0.0 | 7 | +---------+--------+-----------+--------+-------+ [16956 rows x 5 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'precision_recall_overall': Columns: cutoff int precision float recall float Rows: 18 Data: +--------+---------------------+----------------------+ | cutoff | precision | recall | +--------+---------------------+----------------------+ | 1 | 0.45222929936305734 | 0.037110123899123465 | | 2 | 0.40923566878980894 | 0.060494764740282043 | | 3 | 0.3803963198867654 | 0.0820137151142372 | | 4 | 0.357484076433121 | 0.10122765583444787 | | 5 | 0.34288747346072185 | 0.1185950884547385 | | 6 | 0.3315640481245577 | 0.13486665370299455 | | 7 | 0.31922960266909306 | 0.1493753718327443 | | 8 | 0.30506900212314225 | 0.16196475155837983 | | 9 | 0.2945270110875208 | 0.17385805243866798 | | 10 | 0.2845010615711252 | 0.18366774630733118 | +--------+---------------------+----------------------+ [18 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}
Step 4. Build a model of popular movies
In addition, it's convenient for Turicreate to build a model to recommend the most popular movies to users (that is, all users will receive the same recommendation). We can use the popularity ﹣ recommender recommendation function in turicrete to achieve this.
# training model popularity_model = turicreate.popularity_recommender.create(train_data, user_id='user_id', item_id='movie_id', target='rating') # making recommendations popularity_recomm = popularity_model.recommend(users=[1,2,3,4,5],k=5) popularity_recomm.print_rows(num_rows=25)
Recommended results:
All users' recommendations here are the same - 1201, 1122 That is to say, all the users who watch the movie give the highest score.