Install NBCF product suggestion algorithm

Tram Ho

Idea

  • The recommendation system (Recommender Systems) is a component of the information system. Its purpose is to help users find the right information, prediction or ratings that users may have for an item of information (item) that they have not considered in the past. . In this article, I will install the NBCF algorithm on the PHP language
  • NBCF stands for Neighborhood-based Collaborative Filtering
  • The basic idea of ​​NBCF is to determine a user's level of interest in an item based on other users similar to this user. The similarity between users can be determined through the interest of these users to other items that the system already knows. For example, A, B all like Criminal Police film, which rate this movie 5 stars. We already know that A also likes Judges, so B is more likely to like this movie.
  • The two most important questions in a Neighborhood-based Collaborative Filtering system are:
    • How to determine the similarity between two users?
    • Once you've identified similar users (similar users), how to predict the interest level of a user on an item? .

Setting

Similarity Functions

The most important work to do first is to determine the similarity (similarity) between two users. The only data we have is the Utility matrix Y

image

The image above is an example of a matrix based on the score of a user rating for a product. Visually, u0's behavior is more similar to u1 than u2, u3, u4, u5, u6. From that it can be predicted that u0 will be interested in i2 because u1 is also interested in this item. Set the same level of two users ui, uj is sim (ui, uj).

Cosine Similarity

The function to measure sims used in this research is the Cosine Similarity function. This is the most used function, the formula for calculating the cosine of the angle between two vectors a and b

image

image

Standardized data

In the image above, the last row in Figure a) is the average of the ratings for each user. High values ​​correspond to easy-going users and vice versa. Then, if we continue to subtract from each rating go to this value and replace the unknown value with 0 we will be normalized utility matrix but in Figure b).

Determine interest

The common formula used to predict the rating of u for i is

image

Where N (u, i) is the set of k users in the neighborhood (ie, the highest similarity) of u that has rated i. An example of calculating the normalized rating of u1 for i1 is given in figure e) with the number k = 2 steps taken:

  • Identify the users who rated i1, which is u0, u3, u5
  • Determining similarities of u1 with these users we get 0.83, -0.40, -0.23. Two (k = 2) greatest values ​​- 0.83 and -0.23, respectively, with u0 and u5.
  • Determine the normalized ratings of u0, u5 for i1, we get two values, respectively 0.75 and 0.5
  • Predict the results

image

Conversion of the normalized ratings values ​​to a 5-scale can be done by adding the columns of the matrix to the average rating of each user as calculated in Figure a).

Logic function

  • For the above algorithm in the system is applied to similar product suggestion function. The system measures the similarity between two users and then suggests products.
  • The library I use is https://github.com/algenza/cosine-similarity
Function name Input Output Describe
getSimilarity () User rating matrix, Users need suggestions, Other users The similarity between two users, The result of the cosine function between two vectors is in the range [-1; 1] Returns the result of a comparison between a user who needs a suggestion and another user. In settings on this system, only calculate the results with similarity> 0.5
getRecommendation () Matrix of user reviews of products, Users need suggestions Array of products that predict users may love Returns an array of products users may love with a prediction point> 3

The code above I installed may not be very optimized if there is a better idea, please contribute below to help me ?

Share the news now

Source : Viblo