Recommendation Algorithms: Item-based Collaborative Filtering Algorithms

Keywords: Big Data Python Lambda less

  • Reference to "Recommendation System Practice" Item Liang
  • Concept: Item-based collaborative filtering algorithm, optimization algorithm
  • Contrast: Advantages and Disadvantages of User Collaborative Filtering
  • python coding implementation

1. Algorithmic Definition

User-based collaborative filtering algorithm Commodity-based Collaborative Filtering Algorithms
Applicable scenario Areas with strong timeliness and individualized user interest are not obvious Long tail goods are abundant in areas where users have strong personalized needs
Occasions with fewer users: news recommendation Where the number of items is far less than the number of users: books, e-commerce and movie websites
Key points Hotspots of Small Groups with Similar Interests Maintain user's historical interest and recommendation result is explanatory.
User Interaction Characteristics User's new behavior does not necessarily lead to immediate changes in recommendation results Real-time changes in recommendation results
shortcoming Computational complexity - user growth: approximate to the square relation, recommendation results are not very explanatory

1.1 algorithm optimization


Assuming that <user-commodity data> is as follows, the similarity between commodities can be calculated.

1::a	1::b	1::c	2::a	
2::b	3::c	4::a	4::d

2. python implementation

Step 1: Computing similarity between goods (cosine similarity formula)

Step 2: Filter & Generate Recommendation List, Summation of Preference Values

The code is as follows

# -*-coding:utf-8-*-
import math
#Step 1: Calculate the similarity between each item
def getMostLikeItemsGroupK(itemUsersDict):
    recommDict = {}
    for i, usersI in itemUsersDict.items():
        for j, usersJ in itemUsersDict.items():
            if i < j:
                if i not in recommDict.keys():
                    recommDict[i] = {}
                if j not in recommDict.keys():
                    recommDict[j] = {}

                # Computation of similarity and number of common interactive products
                ratio= 0.0
                setA = set(usersI)
                setB = set(usersJ)
                common = setA.intersection(setB)
                if common.__len__() == 0:
                    ratio= 0.0
                else:
                    bot = math.sqrt(setA.__len__() * setB.__len__())
                    ratio= float(common.__len__()) / bot
                recommDict[i][j] = ratio
                recommDict[j][i] = ratio
    # Sort each similarity dictionary in M in reverse order of similarity
    dict2 = {}
    for k,v in recommDict.items():
        dict2[k] = sorted(v.items(), key=lambda e: e[1], reverse=True)
    #1. Non-normalization of Item Similarity
    #return dict2

    #2. Normalization of Item Similarity: Karypis found in the study that if
    def normalItemLike(parmDict):
        dict3 = {}
        for k,v in parmDict.items():
            dict3[k] = []
            mx = v[0][1]
            for t in v:
                dict3[k].append((t[0] , float(t[1]) / mx))
        return dict3
    return normalItemLike(dict2)


##############Step 2: Filter & Generate Recommendation List, Summation of Preference Values
def recommItemCF(u,K):
    # Prepare data: Recommendation dictionary returned [], dictionary between users - > commodities, dictionary between commodities - > users
    recommDict = {}
    userItemsDict = {}
    itemUsersDict = {}
    file = open("/home/wang/IdeaProjects/big124/mypython/a")
    while True :
        line = file.readline()
        if line != '':
            arr = line.replace("\n","").split("::")
            uid = int(arr[0])
            iid = str(arr[1])
            # userItemDict
            if uid in userItemsDict.keys():
                userItemsDict[uid].append(iid)
            else :
                userItemsDict[uid] = [iid]
            # itemUsersDict
            if iid in itemUsersDict.keys():
                itemUsersDict[iid].append(uid)
            else:
                itemUsersDict[iid] = [uid]
        else:
            break ;
    # 1. Search for each user's history product: other similar product groups
    dict1 = getMostLikeItemsGroupK(itemUsersDict)

    for i in userItemsDict[u]: # Find historical goods for each user
        itemsK = dict1[i][:K]  #Take out k similar goods
        for j in itemsK:
            if j[0] not in userItemsDict[u]:

                #2. Generate a list of product recommendations
                if j[0] not in recommDict.keys():
                    recommDict[j[0]] = 0.0
                recommDict[j[0]] = recommDict[j[0]] + j[1]
    #3. Sort the list of commodity recommendations
    return sorted(recommDict.items() , key=lambda e:e[1] ,reverse=True)

###############The third step, test result############################
if __name__ =="__main__":
    recommList=recommItemCF(2,3)
    for i in recommList:
        print i

Contrast results: The accuracy of normalized similarity of goods becomes higher based on collaborative filtering of goods.

#User-based collaborative filtering: recommendation results
# ('c', 1.533734661597076)
# ('d', 0.8304820237218405)

#Commodity-based collaborative filtering: item similarity is not normalized, recommendation results
# ('c', 0.9082482904638631)
# ('d', 0.5773502691896258)
#Commodity-based Collaborative Filtering: Normalization of Item Similarity, Recommendation Results
# ('c', 1.1123724356957945)
# ('d', 0.7071067811865476)

Posted by centerwork on Mon, 28 Jan 2019 23:15:14 -0800