Similar to the previous post, after the theory in Part 1 , in this part I will present the algorithm demo. Let’s find out together
1. Building class MF
Initialization function
Input parameters:
- Y : Utility matrix, consisting of 3 columns, each column has 3 figures: user_id, item_id, rating.
- n_factors : number of hidden dimensions between users and items, default n_factors = 2 .
- X : users matrix
- W : matrix ratings
- lamda : weight the regularization of the loss function to avoid overfitting, default lamda = 0.1
- learning_rate : is learning_rate – the weight of Gradient Descent, used to adjust the learning speed., default learning_rate = 2
- n_epochs : number of iterations for training, default n_epochs = 50
- top : number of suggested items per user. The default is
10
. - filename : File to store evaluation data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | <span class="token keyword">class</span> <span class="token class-name">MF</span> <span class="token punctuation">(</span> <span class="token builtin">object</span> <span class="token punctuation">)</span> <span class="token punctuation">:</span> <span class="token keyword">def</span> <span class="token function">__init__</span> <span class="token punctuation">(</span> self <span class="token punctuation">,</span> Y <span class="token punctuation">,</span> n_factors <span class="token operator">=</span> <span class="token number">2</span> <span class="token punctuation">,</span> X <span class="token operator">=</span> <span class="token boolean">None</span> <span class="token punctuation">,</span> W <span class="token operator">=</span> <span class="token boolean">None</span> <span class="token punctuation">,</span> lamda <span class="token operator">=</span> <span class="token number">0.1</span> <span class="token punctuation">,</span> learning_rate <span class="token operator">=</span> <span class="token number">2</span> <span class="token punctuation">,</span> n_epochs <span class="token operator">=</span> <span class="token number">50</span> <span class="token punctuation">,</span> top <span class="token operator">=</span> <span class="token number">10</span> <span class="token punctuation">,</span> filename <span class="token operator">=</span> <span class="token boolean">None</span> <span class="token punctuation">)</span> <span class="token punctuation">:</span> <span class="token keyword">if</span> filename <span class="token punctuation">:</span> self <span class="token punctuation">.</span> f <span class="token operator">=</span> <span class="token builtin">open</span> <span class="token punctuation">(</span> filename <span class="token punctuation">,</span> <span class="token string">'a+'</span> <span class="token punctuation">)</span> self <span class="token punctuation">.</span> Y <span class="token operator">=</span> Y self <span class="token punctuation">.</span> lamda <span class="token operator">=</span> lamda self <span class="token punctuation">.</span> n_factors <span class="token operator">=</span> n_factors self <span class="token punctuation">.</span> learning_rate <span class="token operator">=</span> learning_rate self <span class="token punctuation">.</span> n_epochs <span class="token operator">=</span> n_epochs self <span class="token punctuation">.</span> top <span class="token operator">=</span> top self <span class="token punctuation">.</span> users_count <span class="token operator">=</span> <span class="token builtin">int</span> <span class="token punctuation">(</span> np <span class="token punctuation">.</span> <span class="token builtin">max</span> <span class="token punctuation">(</span> self <span class="token punctuation">.</span> Y <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token number">1</span> self <span class="token punctuation">.</span> items_count <span class="token operator">=</span> <span class="token builtin">int</span> <span class="token punctuation">(</span> np <span class="token punctuation">.</span> <span class="token builtin">max</span> <span class="token punctuation">(</span> self <span class="token punctuation">.</span> Y <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> <span class="token number">1</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token number">1</span> self <span class="token punctuation">.</span> ratings_count <span class="token operator">=</span> Y <span class="token punctuation">.</span> shape <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token keyword">if</span> X <span class="token operator">==</span> <span class="token boolean">None</span> <span class="token punctuation">:</span> self <span class="token punctuation">.</span> X <span class="token operator">=</span> np <span class="token punctuation">.</span> random <span class="token punctuation">.</span> randn <span class="token punctuation">(</span> self <span class="token punctuation">.</span> items_count <span class="token punctuation">,</span> n_factors <span class="token punctuation">)</span> <span class="token keyword">if</span> W <span class="token operator">==</span> <span class="token boolean">None</span> <span class="token punctuation">:</span> self <span class="token punctuation">.</span> W <span class="token operator">=</span> np <span class="token punctuation">.</span> random <span class="token punctuation">.</span> randn <span class="token punctuation">(</span> n_factors <span class="token punctuation">,</span> self <span class="token punctuation">.</span> users_count <span class="token punctuation">)</span> self <span class="token punctuation">.</span> Ybar <span class="token operator">=</span> self <span class="token punctuation">.</span> Y <span class="token punctuation">.</span> copy <span class="token punctuation">(</span> <span class="token punctuation">)</span> self <span class="token punctuation">.</span> bi <span class="token operator">=</span> np <span class="token punctuation">.</span> random <span class="token punctuation">.</span> randn <span class="token punctuation">(</span> self <span class="token punctuation">.</span> items_count <span class="token punctuation">)</span> self <span class="token punctuation">.</span> bu <span class="token operator">=</span> np <span class="token punctuation">.</span> random <span class="token punctuation">.</span> randn <span class="token punctuation">(</span> self <span class="token punctuation">.</span> users_count <span class="token punctuation">)</span> self <span class="token punctuation">.</span> n_ratings <span class="token operator">=</span> self <span class="token punctuation">.</span> Y <span class="token punctuation">.</span> shape <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> |
Changing the weights, you can observe the influence of weights on the evaluation results of the algorithm.
GetUserRated () and getItemsRatedByUser ()
The get_user_rated_item(i)
returns the list of users who have rated the i
item
1 2 3 4 5 6 7 | <span class="token keyword">def</span> <span class="token function">get_user_rated_item</span> <span class="token punctuation">(</span> self <span class="token punctuation">,</span> i <span class="token punctuation">)</span> <span class="token punctuation">:</span> ids <span class="token operator">=</span> np <span class="token punctuation">.</span> where <span class="token punctuation">(</span> i <span class="token operator">==</span> self <span class="token punctuation">.</span> Ybar <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> <span class="token number">1</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token punctuation">.</span> astype <span class="token punctuation">(</span> <span class="token builtin">int</span> <span class="token punctuation">)</span> users <span class="token operator">=</span> self <span class="token punctuation">.</span> Ybar <span class="token punctuation">[</span> ids <span class="token punctuation">,</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token punctuation">.</span> astype <span class="token punctuation">(</span> <span class="token builtin">int</span> <span class="token punctuation">)</span> ratings <span class="token operator">=</span> self <span class="token punctuation">.</span> Ybar <span class="token punctuation">[</span> ids <span class="token punctuation">,</span> <span class="token number">2</span> <span class="token punctuation">]</span> <span class="token keyword">return</span> <span class="token punctuation">(</span> users <span class="token punctuation">,</span> ratings <span class="token punctuation">)</span> |
The get_item_rated_by_user(u)
returns a list of items evaluated by the u
user
1 2 3 4 5 6 7 | <span class="token keyword">def</span> <span class="token function">get_item_rated_by_user</span> <span class="token punctuation">(</span> self <span class="token punctuation">,</span> u <span class="token punctuation">)</span> <span class="token punctuation">:</span> ids <span class="token operator">=</span> np <span class="token punctuation">.</span> where <span class="token punctuation">(</span> u <span class="token operator">==</span> self <span class="token punctuation">.</span> Ybar <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token punctuation">.</span> astype <span class="token punctuation">(</span> <span class="token builtin">int</span> <span class="token punctuation">)</span> items <span class="token operator">=</span> self <span class="token punctuation">.</span> Ybar <span class="token punctuation">[</span> ids <span class="token punctuation">,</span> <span class="token number">1</span> <span class="token punctuation">]</span> <span class="token punctuation">.</span> astype <span class="token punctuation">(</span> <span class="token builtin">int</span> <span class="token punctuation">)</span> ratings <span class="token operator">=</span> self <span class="token punctuation">.</span> Ybar <span class="token punctuation">[</span> ids <span class="token punctuation">,</span> <span class="token number">2</span> <span class="token punctuation">]</span> <span class="token keyword">return</span> <span class="token punctuation">(</span> items <span class="token punctuation">,</span> ratings <span class="token punctuation">)</span> |
We will use these two functions to optimize the two matrices X and W.
The update X and W functions :
These are the two optimal functions X and W , with the number of loops being fixed at 50 times.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | <span class="token keyword">def</span> <span class="token function">updateX</span> <span class="token punctuation">(</span> self <span class="token punctuation">)</span> <span class="token punctuation">:</span> <span class="token keyword">for</span> m <span class="token keyword">in</span> <span class="token builtin">range</span> <span class="token punctuation">(</span> self <span class="token punctuation">.</span> items_count <span class="token punctuation">)</span> <span class="token punctuation">:</span> users <span class="token punctuation">,</span> ratings <span class="token operator">=</span> self <span class="token punctuation">.</span> get_user_rated_item <span class="token punctuation">(</span> m <span class="token punctuation">)</span> Wm <span class="token operator">=</span> self <span class="token punctuation">.</span> W <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> users <span class="token punctuation">]</span> b <span class="token operator">=</span> self <span class="token punctuation">.</span> bu <span class="token punctuation">[</span> users <span class="token punctuation">]</span> sum_grad_xm <span class="token operator">=</span> np <span class="token punctuation">.</span> full <span class="token punctuation">(</span> shape <span class="token operator">=</span> <span class="token punctuation">(</span> self <span class="token punctuation">.</span> X <span class="token punctuation">[</span> m <span class="token punctuation">]</span> <span class="token punctuation">.</span> shape <span class="token punctuation">)</span> <span class="token punctuation">,</span> fill_value <span class="token operator">=</span> <span class="token number">1e</span> <span class="token operator">-</span> <span class="token number">8</span> <span class="token punctuation">)</span> sum_grad_bm <span class="token operator">=</span> <span class="token number">1e</span> <span class="token operator">-</span> <span class="token number">8</span> <span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span> <span class="token punctuation">(</span> <span class="token number">50</span> <span class="token punctuation">)</span> <span class="token punctuation">:</span> xm <span class="token operator">=</span> self <span class="token punctuation">.</span> X <span class="token punctuation">[</span> m <span class="token punctuation">]</span> error <span class="token operator">=</span> xm <span class="token punctuation">.</span> dot <span class="token punctuation">(</span> Wm <span class="token punctuation">)</span> <span class="token operator">+</span> self <span class="token punctuation">.</span> bi <span class="token punctuation">[</span> m <span class="token punctuation">]</span> <span class="token operator">+</span> b <span class="token operator">-</span> ratings grad_xm <span class="token operator">=</span> error <span class="token punctuation">.</span> dot <span class="token punctuation">(</span> Wm <span class="token punctuation">.</span> T <span class="token punctuation">)</span> <span class="token operator">/</span> self <span class="token punctuation">.</span> n_ratings <span class="token operator">+</span> self <span class="token punctuation">.</span> lamda <span class="token operator">*</span> xm grad_bm <span class="token operator">=</span> np <span class="token punctuation">.</span> <span class="token builtin">sum</span> <span class="token punctuation">(</span> error <span class="token punctuation">)</span> <span class="token operator">/</span> self <span class="token punctuation">.</span> n_ratings sum_grad_xm <span class="token operator">+=</span> grad_xm <span class="token operator">**</span> <span class="token number">2</span> sum_grad_bm <span class="token operator">+=</span> grad_bm <span class="token operator">**</span> <span class="token number">2</span> <span class="token comment"># gradient descent</span> self <span class="token punctuation">.</span> X <span class="token punctuation">[</span> m <span class="token punctuation">]</span> <span class="token operator">-=</span> self <span class="token punctuation">.</span> lr <span class="token operator">*</span> grad_xm <span class="token punctuation">.</span> reshape <span class="token punctuation">(</span> <span class="token operator">-</span> <span class="token number">1</span> <span class="token punctuation">)</span> <span class="token operator">/</span> np <span class="token punctuation">.</span> sqrt <span class="token punctuation">(</span> sum_grad_xm <span class="token punctuation">)</span> self <span class="token punctuation">.</span> bi <span class="token punctuation">[</span> m <span class="token punctuation">]</span> <span class="token operator">-=</span> self <span class="token punctuation">.</span> lr <span class="token operator">*</span> grad_bm <span class="token operator">/</span> np <span class="token punctuation">.</span> sqrt <span class="token punctuation">(</span> sum_grad_bm <span class="token punctuation">)</span> <span class="token keyword">def</span> <span class="token function">updateW</span> <span class="token punctuation">(</span> self <span class="token punctuation">)</span> <span class="token punctuation">:</span> <span class="token keyword">for</span> n <span class="token keyword">in</span> <span class="token builtin">range</span> <span class="token punctuation">(</span> self <span class="token punctuation">.</span> users_count <span class="token punctuation">)</span> <span class="token punctuation">:</span> items <span class="token punctuation">,</span> ratings <span class="token operator">=</span> self <span class="token punctuation">.</span> get_item_rated_by_user <span class="token punctuation">(</span> n <span class="token punctuation">)</span> Xn <span class="token operator">=</span> self <span class="token punctuation">.</span> X <span class="token punctuation">[</span> items <span class="token punctuation">,</span> <span class="token punctuation">:</span> <span class="token punctuation">]</span> b <span class="token operator">=</span> self <span class="token punctuation">.</span> bi <span class="token punctuation">[</span> items <span class="token punctuation">]</span> sum_grad_wn <span class="token operator">=</span> np <span class="token punctuation">.</span> full <span class="token punctuation">(</span> shape <span class="token operator">=</span> <span class="token punctuation">(</span> self <span class="token punctuation">.</span> W <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> n <span class="token punctuation">]</span> <span class="token punctuation">.</span> shape <span class="token punctuation">)</span> <span class="token punctuation">,</span> fill_value <span class="token operator">=</span> <span class="token number">1e</span> <span class="token operator">-</span> <span class="token number">8</span> <span class="token punctuation">)</span> <span class="token punctuation">.</span> T sum_grad_bn <span class="token operator">=</span> <span class="token number">1e</span> <span class="token operator">-</span> <span class="token number">8</span> <span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span> <span class="token punctuation">(</span> <span class="token number">50</span> <span class="token punctuation">)</span> <span class="token punctuation">:</span> wn <span class="token operator">=</span> self <span class="token punctuation">.</span> W <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> n <span class="token punctuation">]</span> error <span class="token operator">=</span> Xn <span class="token punctuation">.</span> dot <span class="token punctuation">(</span> wn <span class="token punctuation">)</span> <span class="token operator">+</span> self <span class="token punctuation">.</span> bu <span class="token punctuation">[</span> n <span class="token punctuation">]</span> <span class="token operator">+</span> b <span class="token operator">-</span> ratings grad_wn <span class="token operator">=</span> Xn <span class="token punctuation">.</span> T <span class="token punctuation">.</span> dot <span class="token punctuation">(</span> error <span class="token punctuation">)</span> <span class="token operator">/</span> self <span class="token punctuation">.</span> n_ratings <span class="token operator">+</span> self <span class="token punctuation">.</span> lamda <span class="token operator">*</span> wn grad_bn <span class="token operator">=</span> np <span class="token punctuation">.</span> <span class="token builtin">sum</span> <span class="token punctuation">(</span> error <span class="token punctuation">)</span> <span class="token operator">/</span> self <span class="token punctuation">.</span> n_ratings sum_grad_wn <span class="token operator">+=</span> grad_wn <span class="token operator">**</span> <span class="token number">2</span> sum_grad_bn <span class="token operator">+=</span> grad_bn <span class="token operator">**</span> <span class="token number">2</span> <span class="token comment"># gradient descent</span> self <span class="token punctuation">.</span> W <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> n <span class="token punctuation">]</span> <span class="token operator">-=</span> self <span class="token punctuation">.</span> lr <span class="token operator">*</span> grad_wn <span class="token punctuation">.</span> reshape <span class="token punctuation">(</span> <span class="token operator">-</span> <span class="token number">1</span> <span class="token punctuation">)</span> <span class="token operator">/</span> np <span class="token punctuation">.</span> sqrt <span class="token punctuation">(</span> sum_grad_wn <span class="token punctuation">)</span> self <span class="token punctuation">.</span> bu <span class="token punctuation">[</span> n <span class="token punctuation">]</span> <span class="token operator">-=</span> self <span class="token punctuation">.</span> lr <span class="token operator">*</span> grad_bn <span class="token operator">/</span> np <span class="token punctuation">.</span> sqrt <span class="token punctuation">(</span> sum_grad_bn <span class="token punctuation">)</span> |
Main algorithm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | <span class="token keyword">def</span> <span class="token function">fit</span> <span class="token punctuation">(</span> self <span class="token punctuation">,</span> x <span class="token punctuation">,</span> data_size <span class="token punctuation">,</span> Data_test <span class="token punctuation">,</span> test_size <span class="token operator">=</span> <span class="token number">0</span> <span class="token punctuation">)</span> <span class="token punctuation">:</span> <span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span> <span class="token punctuation">(</span> self <span class="token punctuation">.</span> n_epochs <span class="token punctuation">)</span> <span class="token punctuation">:</span> self <span class="token punctuation">.</span> updateW <span class="token punctuation">(</span> <span class="token punctuation">)</span> self <span class="token punctuation">.</span> updateX <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token keyword">if</span> <span class="token punctuation">(</span> i <span class="token operator">+</span> <span class="token number">1</span> <span class="token punctuation">)</span> <span class="token operator">%</span> x <span class="token operator">==</span> <span class="token number">0</span> <span class="token punctuation">:</span> self <span class="token punctuation">.</span> RMSE <span class="token punctuation">(</span> Data_test <span class="token punctuation">,</span> data_size <span class="token operator">=</span> data_size <span class="token punctuation">,</span> test_size <span class="token operator">=</span> <span class="token number">0</span> <span class="token punctuation">,</span> p <span class="token operator">=</span> i <span class="token operator">+</span> <span class="token number">1</span> <span class="token punctuation">)</span> <span class="token comment"># self.evaluate(data_size, Data_test, test_size = 0)</span> <span class="token keyword">def</span> <span class="token function">pred</span> <span class="token punctuation">(</span> self <span class="token punctuation">,</span> u <span class="token punctuation">,</span> i <span class="token punctuation">)</span> <span class="token punctuation">:</span> u <span class="token operator">=</span> <span class="token builtin">int</span> <span class="token punctuation">(</span> u <span class="token punctuation">)</span> i <span class="token operator">=</span> <span class="token builtin">int</span> <span class="token punctuation">(</span> i <span class="token punctuation">)</span> pred <span class="token operator">=</span> self <span class="token punctuation">.</span> X <span class="token punctuation">[</span> i <span class="token punctuation">,</span> <span class="token punctuation">:</span> <span class="token punctuation">]</span> <span class="token punctuation">.</span> dot <span class="token punctuation">(</span> self <span class="token punctuation">.</span> W <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> u <span class="token punctuation">]</span> <span class="token punctuation">)</span> <span class="token operator">+</span> self <span class="token punctuation">.</span> bi <span class="token punctuation">[</span> i <span class="token punctuation">]</span> <span class="token operator">+</span> self <span class="token punctuation">.</span> bu <span class="token punctuation">[</span> u <span class="token punctuation">]</span> <span class="token keyword">return</span> <span class="token builtin">max</span> <span class="token punctuation">(</span> <span class="token number">0</span> <span class="token punctuation">,</span> <span class="token builtin">min</span> <span class="token punctuation">(</span> <span class="token number">5</span> <span class="token punctuation">,</span> pred <span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token keyword">def</span> <span class="token function">recommend</span> <span class="token punctuation">(</span> self <span class="token punctuation">,</span> u <span class="token punctuation">)</span> <span class="token punctuation">:</span> ids <span class="token operator">=</span> np <span class="token punctuation">.</span> where <span class="token punctuation">(</span> self <span class="token punctuation">.</span> Y <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token operator">==</span> u <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token punctuation">.</span> astype <span class="token punctuation">(</span> <span class="token builtin">int</span> <span class="token punctuation">)</span> items_rated_by_user <span class="token operator">=</span> self <span class="token punctuation">.</span> Y <span class="token punctuation">[</span> ids <span class="token punctuation">,</span> <span class="token number">1</span> <span class="token punctuation">]</span> <span class="token punctuation">.</span> tolist <span class="token punctuation">(</span> <span class="token punctuation">)</span> a <span class="token operator">=</span> np <span class="token punctuation">.</span> zeros <span class="token punctuation">(</span> <span class="token punctuation">(</span> self <span class="token punctuation">.</span> items_count <span class="token punctuation">,</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> recommended_items <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token punctuation">]</span> pred <span class="token operator">=</span> self <span class="token punctuation">.</span> X <span class="token punctuation">.</span> dot <span class="token punctuation">(</span> self <span class="token punctuation">.</span> W <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> u <span class="token punctuation">]</span> <span class="token punctuation">)</span> <span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span> <span class="token punctuation">(</span> self <span class="token punctuation">.</span> items_count <span class="token punctuation">)</span> <span class="token punctuation">:</span> <span class="token keyword">if</span> i <span class="token operator">not</span> <span class="token keyword">in</span> items_rated_by_user <span class="token punctuation">:</span> a <span class="token punctuation">[</span> i <span class="token punctuation">]</span> <span class="token operator">=</span> pred <span class="token punctuation">[</span> i <span class="token punctuation">]</span> <span class="token operator">+</span> self <span class="token punctuation">.</span> bi <span class="token punctuation">[</span> i <span class="token punctuation">]</span> <span class="token operator">+</span> self <span class="token punctuation">.</span> bu <span class="token punctuation">[</span> u <span class="token punctuation">]</span> <span class="token keyword">if</span> <span class="token builtin">len</span> <span class="token punctuation">(</span> a <span class="token punctuation">)</span> <span class="token operator"><</span> self <span class="token punctuation">.</span> top <span class="token punctuation">:</span> recommended_items <span class="token operator">=</span> np <span class="token punctuation">.</span> argsort <span class="token punctuation">(</span> a <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token operator">-</span> self <span class="token punctuation">.</span> items_count <span class="token punctuation">:</span> <span class="token punctuation">]</span> <span class="token keyword">else</span> <span class="token punctuation">:</span> recommended_items <span class="token operator">=</span> np <span class="token punctuation">.</span> argsort <span class="token punctuation">(</span> a <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token operator">-</span> self <span class="token punctuation">.</span> top <span class="token punctuation">:</span> <span class="token punctuation">]</span> recommended_items <span class="token operator">=</span> np <span class="token punctuation">.</span> where <span class="token punctuation">(</span> a <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">]</span> <span class="token operator">></span> <span class="token number">0</span> <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token punctuation">.</span> astype <span class="token punctuation">(</span> <span class="token builtin">int</span> <span class="token punctuation">)</span> <span class="token comment"># return random.sample(list(recommended_items), self.top)</span> <span class="token keyword">return</span> recommended_items <span class="token punctuation">[</span> <span class="token punctuation">:</span> self <span class="token punctuation">.</span> limit <span class="token punctuation">]</span> <span class="token comment"># return recommended_items</span> |
2. Evaluation
Similar to the previous two methods, here I use 2 measures, RMSE
and PR
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | <span class="token keyword">def</span> <span class="token function">RMSE</span> <span class="token punctuation">(</span> self <span class="token punctuation">,</span> Data_test <span class="token punctuation">,</span> test_size <span class="token operator">=</span> <span class="token number">0</span> <span class="token punctuation">,</span> data_size <span class="token operator">=</span> <span class="token string">'100K'</span> <span class="token punctuation">,</span> p <span class="token operator">=</span> <span class="token number">10</span> <span class="token punctuation">)</span> <span class="token punctuation">:</span> n_tests <span class="token operator">=</span> Data_test <span class="token punctuation">.</span> shape <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> SE <span class="token operator">=</span> <span class="token number">0</span> <span class="token keyword">for</span> n <span class="token keyword">in</span> <span class="token builtin">range</span> <span class="token punctuation">(</span> n_tests <span class="token punctuation">)</span> <span class="token punctuation">:</span> pred <span class="token operator">=</span> self <span class="token punctuation">.</span> pred <span class="token punctuation">(</span> Data_test <span class="token punctuation">[</span> n <span class="token punctuation">,</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> Data_test <span class="token punctuation">[</span> n <span class="token punctuation">,</span> <span class="token number">1</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> SE <span class="token operator">+=</span> <span class="token punctuation">(</span> pred <span class="token operator">-</span> Data_test <span class="token punctuation">[</span> n <span class="token punctuation">,</span> <span class="token number">2</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> <span class="token operator">**</span> <span class="token number">2</span> RMSE <span class="token operator">=</span> np <span class="token punctuation">.</span> sqrt <span class="token punctuation">(</span> SE <span class="token operator">/</span> n_tests <span class="token punctuation">)</span> <span class="token keyword">print</span> <span class="token punctuation">(</span> <span class="token string">'%s::1::%d::%d::%r::%r::%rrn'</span> <span class="token operator">%</span> <span class="token punctuation">(</span> <span class="token builtin">str</span> <span class="token punctuation">(</span> data_size <span class="token punctuation">)</span> <span class="token punctuation">,</span> self <span class="token punctuation">.</span> n_factors <span class="token punctuation">,</span> self <span class="token punctuation">.</span> n_epochs <span class="token punctuation">,</span> self <span class="token punctuation">.</span> lamda <span class="token punctuation">,</span> self <span class="token punctuation">.</span> lr <span class="token punctuation">,</span> RMSE <span class="token punctuation">)</span> <span class="token punctuation">)</span> self <span class="token punctuation">.</span> f <span class="token punctuation">.</span> write <span class="token punctuation">(</span> <span class="token string">'%s::1::%d::%d::%d::%r::%r::%rrn'</span> <span class="token operator">%</span> <span class="token punctuation">(</span> <span class="token builtin">str</span> <span class="token punctuation">(</span> data_size <span class="token punctuation">)</span> <span class="token punctuation">,</span> self <span class="token punctuation">.</span> n_factors <span class="token punctuation">,</span> self <span class="token punctuation">.</span> n_epochs <span class="token punctuation">,</span> p <span class="token punctuation">,</span> self <span class="token punctuation">.</span> lamda <span class="token punctuation">,</span> self <span class="token punctuation">.</span> lr <span class="token punctuation">,</span> RMSE <span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token keyword">return</span> RMSE <span class="token keyword">def</span> <span class="token function">evaluate</span> <span class="token punctuation">(</span> self <span class="token punctuation">,</span> data_size <span class="token punctuation">,</span> Data_test <span class="token punctuation">,</span> test_size <span class="token operator">=</span> <span class="token number">0</span> <span class="token punctuation">)</span> <span class="token punctuation">:</span> sum_p <span class="token operator">=</span> <span class="token number">0</span> sum_r <span class="token operator">=</span> <span class="token number">0</span> self <span class="token punctuation">.</span> Pu <span class="token operator">=</span> np <span class="token punctuation">.</span> zeros <span class="token punctuation">(</span> <span class="token punctuation">(</span> self <span class="token punctuation">.</span> users_count <span class="token punctuation">,</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token keyword">for</span> u <span class="token keyword">in</span> <span class="token builtin">range</span> <span class="token punctuation">(</span> self <span class="token punctuation">.</span> users_count <span class="token punctuation">)</span> <span class="token punctuation">:</span> recommended_items <span class="token operator">=</span> self <span class="token punctuation">.</span> recommend <span class="token punctuation">(</span> u <span class="token punctuation">)</span> ids <span class="token operator">=</span> np <span class="token punctuation">.</span> where <span class="token punctuation">(</span> Data_test <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token operator">==</span> u <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> rated_items <span class="token operator">=</span> Data_test <span class="token punctuation">[</span> ids <span class="token punctuation">,</span> <span class="token number">1</span> <span class="token punctuation">]</span> <span class="token keyword">for</span> i <span class="token keyword">in</span> recommended_items <span class="token punctuation">:</span> <span class="token keyword">if</span> i <span class="token keyword">in</span> rated_items <span class="token punctuation">:</span> self <span class="token punctuation">.</span> Pu <span class="token punctuation">[</span> u <span class="token punctuation">]</span> <span class="token operator">+=</span> <span class="token number">1</span> sum_p <span class="token operator">+=</span> self <span class="token punctuation">.</span> Pu <span class="token punctuation">[</span> u <span class="token punctuation">]</span> p <span class="token operator">=</span> sum_p <span class="token operator">/</span> <span class="token punctuation">(</span> self <span class="token punctuation">.</span> users_count <span class="token operator">*</span> self <span class="token punctuation">.</span> limit <span class="token punctuation">)</span> r <span class="token operator">=</span> sum_p <span class="token operator">/</span> <span class="token punctuation">(</span> Data_test <span class="token punctuation">.</span> shape <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> self <span class="token punctuation">.</span> f <span class="token punctuation">.</span> write <span class="token punctuation">(</span> <span class="token string">'%s::1::%d::%d::%d::%r::%r::%rrn'</span> <span class="token operator">%</span> <span class="token punctuation">(</span> <span class="token builtin">str</span> <span class="token punctuation">(</span> data_size <span class="token punctuation">)</span> <span class="token punctuation">,</span> self <span class="token punctuation">.</span> top <span class="token punctuation">,</span> self <span class="token punctuation">.</span> n_factors <span class="token punctuation">,</span> self <span class="token punctuation">.</span> n_epochs <span class="token punctuation">,</span> test_size <span class="token punctuation">,</span> p <span class="token punctuation">,</span> r <span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token keyword">return</span> p <span class="token punctuation">,</span> r |
3. Demo with Movielen dataset
1 2 3 4 | rs <span class="token operator">=</span> MF <span class="token punctuation">(</span> rate_train <span class="token punctuation">,</span> n_factors <span class="token operator">=</span> <span class="token number">2</span> <span class="token punctuation">,</span> lamda <span class="token operator">=</span> <span class="token number">0.01</span> <span class="token punctuation">,</span> lr <span class="token operator">=</span> <span class="token number">0.1</span> <span class="token punctuation">,</span> n_epochs <span class="token operator">=</span> <span class="token number">20</span> <span class="token punctuation">,</span> filename <span class="token operator">=</span> <span class="token string">'RMSE_100K_MF.dat'</span> <span class="token punctuation">)</span> rs <span class="token punctuation">.</span> fit <span class="token punctuation">(</span> <span class="token number">10</span> <span class="token punctuation">,</span> <span class="token string">"100K"</span> <span class="token punctuation">,</span> rate_test <span class="token punctuation">)</span> rs <span class="token punctuation">.</span> f <span class="token punctuation">.</span> close <span class="token punctuation">(</span> <span class="token punctuation">)</span> |
The results I obtained are:
1 2 3 4 | 100K::1::2::20::0.01::0.1::0.9634817342439627 100K::1::2::20::0.01::0.1::0.9634984986336697 |
Change the weights to find the best set of weights
1 2 3 4 5 6 7 | <span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token punctuation">[</span> <span class="token number">50</span> <span class="token punctuation">,</span> <span class="token number">60</span> <span class="token punctuation">]</span> <span class="token punctuation">:</span> <span class="token keyword">for</span> j <span class="token keyword">in</span> <span class="token punctuation">[</span> <span class="token number">0.01</span> <span class="token punctuation">,</span> <span class="token number">0.1</span> <span class="token punctuation">,</span> <span class="token number">0.5</span> <span class="token punctuation">,</span> <span class="token number">1</span> <span class="token punctuation">]</span> <span class="token punctuation">:</span> <span class="token keyword">for</span> k <span class="token keyword">in</span> <span class="token punctuation">[</span> <span class="token number">0.1</span> <span class="token punctuation">,</span> <span class="token number">0.5</span> <span class="token punctuation">,</span> <span class="token number">0.75</span> <span class="token punctuation">,</span> <span class="token number">1</span> <span class="token punctuation">,</span> <span class="token number">2</span> <span class="token punctuation">]</span> <span class="token punctuation">:</span> rs <span class="token operator">=</span> MF <span class="token punctuation">(</span> rate_train <span class="token punctuation">,</span> n_factors <span class="token operator">=</span> i <span class="token punctuation">,</span> lamda <span class="token operator">=</span> <span class="token number">0.1</span> <span class="token punctuation">,</span> lr <span class="token operator">=</span> <span class="token number">0.1</span> <span class="token punctuation">,</span> n_epochs <span class="token operator">=</span> <span class="token number">10</span> <span class="token punctuation">)</span> rs <span class="token punctuation">.</span> fit <span class="token punctuation">(</span> <span class="token number">10</span> <span class="token punctuation">,</span> data_size <span class="token operator">=</span> <span class="token string">"1M"</span> <span class="token punctuation">,</span> Data_test <span class="token operator">=</span> rate_test <span class="token punctuation">,</span> test_size <span class="token operator">=</span> <span class="token number">0.1</span> <span class="token punctuation">)</span> rs <span class="token punctuation">.</span> f <span class="token punctuation">.</span> close <span class="token punctuation">(</span> <span class="token punctuation">)</span> |
Source code and references:
https://machinelearningcoban.com/2017/05/31/matrixfactorization/