Ở phần 2 này thì mình sẽ viết demo cho Neighborhood-based Collaborative Filtering (NBCF).
Lý thuyết mình đã trình bày ở phần 1, nếu muốn các bạn có thể xem lại tại đây
1. Về hàm khoảng cách
Trong bài này, mình sẽ demo với 2 hàm khoảng cách cơ bản là cosine và pearson
Với cosine, mình sử dụng hàm cosine_similarity
có sẵn của sklearn
.
Còn pearson, mình sử dụng pearsonr
của scipy
. Nhưng do hàm này không nhận ma trận (mảng 2 chiều) nên mình phải convert lại như vậy:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | <span class="token keyword">from</span> scipy<span class="token punctuation">.</span>stats<span class="token punctuation">.</span>stats <span class="token keyword">import</span> pearsonr <span class="token keyword">def</span> <span class="token function">pearson</span><span class="token punctuation">(</span>X<span class="token punctuation">,</span> Y <span class="token operator">=</span> <span class="token boolean">None</span><span class="token punctuation">)</span><span class="token punctuation">:</span> x <span class="token operator">=</span> X<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> y <span class="token operator">=</span> X<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> a <span class="token operator">=</span> np<span class="token punctuation">.</span>zeros<span class="token punctuation">(</span><span class="token punctuation">(</span>x<span class="token punctuation">,</span> x<span class="token punctuation">)</span><span class="token punctuation">)</span> u <span class="token operator">=</span> np<span class="token punctuation">.</span>zeros<span class="token punctuation">(</span><span class="token punctuation">(</span>x<span class="token punctuation">,</span> y<span class="token punctuation">)</span><span class="token punctuation">)</span> temp <span class="token operator">=</span> <span class="token number">0</span> <span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>x<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">for</span> j <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>y<span class="token punctuation">)</span><span class="token punctuation">:</span> u<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">[</span>j<span class="token punctuation">]</span> <span class="token operator">=</span> X<span class="token punctuation">[</span>i<span class="token punctuation">,</span> j<span class="token punctuation">]</span> <span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>x<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">for</span> j <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>x<span class="token punctuation">)</span><span class="token punctuation">:</span> temp <span class="token operator">=</span> pearsonr<span class="token punctuation">(</span>u<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">,</span> u<span class="token punctuation">[</span>j<span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> a<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">[</span>j<span class="token punctuation">]</span> <span class="token operator">=</span> temp <span class="token keyword">if</span> <span class="token operator">not</span> np<span class="token punctuation">.</span>isnan<span class="token punctuation">(</span>temp<span class="token punctuation">)</span> <span class="token keyword">else</span> <span class="token number">0</span> <span class="token keyword">return</span> a |
2. Xây dựng class NBCF
Hàm khởi tạo:
Tham số đầu vào:
- Y: ma trận Utility, gồm 3 cột, mỗi cột gồm 3 số liệu: user_id, item_id, rating.
- k: số lượng láng giềng lựa chọn để dự đoán rating.
- uuCF: Nếu sử dụng
uuCF
thì uuCF = 1 , ngược lại uuCF = 0. Tham số nhận giá trị mặc định là1
. - dist_f: Hàm khoảng cách, như đã nói ở mục 1, ở đây mình sử dụng 2 hàm cosine và pearson. Tham số nhận giá trị mặc định là hàm
cosine_similarity
củaklearn
. - limit: Số lượng items gợi ý cho mỗi user. Mặc định bằng
10
.
1 2 3 4 5 6 7 8 9 10 11 12 13 | <span class="token keyword">def</span> <span class="token function">__init__</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> Y<span class="token punctuation">,</span> k<span class="token punctuation">,</span> uuCF <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">,</span> dist_f <span class="token operator">=</span> cosine_similarity<span class="token punctuation">,</span> limit <span class="token operator">=</span> <span class="token number">10</span><span class="token punctuation">)</span><span class="token punctuation">:</span> self<span class="token punctuation">.</span>uuCF <span class="token operator">=</span> uuCF self<span class="token punctuation">.</span>f <span class="token operator">=</span> <span class="token builtin">open</span><span class="token punctuation">(</span><span class="token string">'danhgiaNBCF.dat'</span><span class="token punctuation">,</span> <span class="token string">'a+'</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>Y <span class="token operator">=</span> Y <span class="token keyword">if</span> uuCF <span class="token keyword">else</span> Y<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">]</span> self<span class="token punctuation">.</span>Ybar <span class="token operator">=</span> <span class="token boolean">None</span> self<span class="token punctuation">.</span>k <span class="token operator">=</span> k self<span class="token punctuation">.</span>limit <span class="token operator">=</span> limit self<span class="token punctuation">.</span>dist_func <span class="token operator">=</span> dist_f self<span class="token punctuation">.</span>users_count <span class="token operator">=</span> <span class="token builtin">int</span><span class="token punctuation">(</span>np<span class="token punctuation">.</span><span class="token builtin">max</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>Y<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token number">1</span> self<span class="token punctuation">.</span>items_count <span class="token operator">=</span> <span class="token builtin">int</span><span class="token punctuation">(</span>np<span class="token punctuation">.</span><span class="token builtin">max</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>Y<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token number">1</span> self<span class="token punctuation">.</span>Pu <span class="token operator">=</span> <span class="token boolean">None</span> self<span class="token punctuation">.</span>Ru <span class="token operator">=</span> <span class="token boolean">None</span> |
Lưu ý, class NBCF đồng bộ cho cả hai phương pháp
iiCF
vàuuCF
nên khi tính toán bằnguuCF
, ma trận Utility truyền vào làY
, ngược lại thì cột 0 và 1 của Y được đổi chỗ cho nhau (hoán đổi vị trí của user và item)
Hàm chuẩn hóa
Như lý thuyết trong Phần 1, mỗi rating của mỗi user được chuẩn hóa bằng cách trừ đi trung bình cộng các rating mà user đó đã đánh giá:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | <span class="token keyword">def</span> <span class="token function">normalizeY</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span> users <span class="token operator">=</span> self<span class="token punctuation">.</span>Y<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span> self<span class="token punctuation">.</span>Ybar <span class="token operator">=</span> self<span class="token punctuation">.</span>Y<span class="token punctuation">.</span>copy<span class="token punctuation">(</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>mu <span class="token operator">=</span> np<span class="token punctuation">.</span>zeros<span class="token punctuation">(</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>users_count<span class="token punctuation">,</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>users_count<span class="token punctuation">)</span><span class="token punctuation">:</span> ids <span class="token operator">=</span> np<span class="token punctuation">.</span>where<span class="token punctuation">(</span>users <span class="token operator">==</span> i<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span>astype<span class="token punctuation">(</span><span class="token builtin">int</span><span class="token punctuation">)</span> ratings <span class="token operator">=</span> self<span class="token punctuation">.</span>Y<span class="token punctuation">[</span>ids<span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">]</span> m <span class="token operator">=</span> np<span class="token punctuation">.</span>mean<span class="token punctuation">(</span>ratings<span class="token punctuation">)</span> <span class="token keyword">if</span> np<span class="token punctuation">.</span>isnan<span class="token punctuation">(</span>m<span class="token punctuation">)</span><span class="token punctuation">:</span> m <span class="token operator">=</span> <span class="token number">0</span> self<span class="token punctuation">.</span>mu<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">=</span> m self<span class="token punctuation">.</span>Ybar<span class="token punctuation">[</span>ids<span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">]</span> <span class="token operator">=</span> ratings <span class="token operator">-</span> self<span class="token punctuation">.</span>mu<span class="token punctuation">[</span>i<span class="token punctuation">]</span> self<span class="token punctuation">.</span>Ybar <span class="token operator">=</span> sparse<span class="token punctuation">.</span>coo_matrix<span class="token punctuation">(</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>Ybar<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>self<span class="token punctuation">.</span>Ybar<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> self<span class="token punctuation">.</span>Ybar<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>self<span class="token punctuation">.</span>items_count<span class="token punctuation">,</span> self<span class="token punctuation">.</span>users_count<span class="token punctuation">)</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>Ybar <span class="token operator">=</span> self<span class="token punctuation">.</span>Ybar<span class="token punctuation">.</span>tocsr<span class="token punctuation">(</span><span class="token punctuation">)</span> |
Hàm tính khoảng cách tương đồng
1 2 3 | <span class="token keyword">def</span> <span class="token function">similarity</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span> self<span class="token punctuation">.</span>S <span class="token operator">=</span> self<span class="token punctuation">.</span>dist_func<span class="token punctuation">(</span>self<span class="token punctuation">.</span>Ybar<span class="token punctuation">.</span>T<span class="token punctuation">,</span> self<span class="token punctuation">.</span>Ybar<span class="token punctuation">.</span>T<span class="token punctuation">)</span> |
Hàm dự đoán rating và đưa ra danh sách items
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | <span class="token keyword">def</span> <span class="token function">pred</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> u<span class="token punctuation">,</span> i<span class="token punctuation">,</span> normalized <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> ids <span class="token operator">=</span> np<span class="token punctuation">.</span>where<span class="token punctuation">(</span>self<span class="token punctuation">.</span>Y<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">==</span> i<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span>astype<span class="token punctuation">(</span><span class="token builtin">int</span><span class="token punctuation">)</span> <span class="token keyword">if</span> ids <span class="token operator">==</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">:</span> <span class="token keyword">return</span> <span class="token number">0</span> users <span class="token operator">=</span> <span class="token punctuation">(</span>self<span class="token punctuation">.</span>Y<span class="token punctuation">[</span>ids<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">.</span>astype<span class="token punctuation">(</span><span class="token builtin">int</span><span class="token punctuation">)</span> sim <span class="token operator">=</span> self<span class="token punctuation">.</span>S<span class="token punctuation">[</span>u<span class="token punctuation">,</span> users<span class="token punctuation">]</span> a <span class="token operator">=</span> np<span class="token punctuation">.</span>argsort<span class="token punctuation">(</span>sim<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token operator">-</span>self<span class="token punctuation">.</span>k<span class="token punctuation">:</span><span class="token punctuation">]</span> nearest <span class="token operator">=</span> sim<span class="token punctuation">[</span>a<span class="token punctuation">]</span> r <span class="token operator">=</span> self<span class="token punctuation">.</span>Ybar<span class="token punctuation">[</span>i<span class="token punctuation">,</span> users<span class="token punctuation">[</span>a<span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token keyword">if</span> normalized<span class="token punctuation">:</span> <span class="token keyword">return</span> <span class="token punctuation">(</span>r<span class="token operator">*</span>nearest<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token operator">/</span><span class="token punctuation">(</span>np<span class="token punctuation">.</span><span class="token builtin">abs</span><span class="token punctuation">(</span>nearest<span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token number">1e</span><span class="token operator">-</span><span class="token number">8</span><span class="token punctuation">)</span> <span class="token keyword">return</span> <span class="token punctuation">(</span>r<span class="token operator">*</span>nearest<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token operator">/</span><span class="token punctuation">(</span>np<span class="token punctuation">.</span><span class="token builtin">abs</span><span class="token punctuation">(</span>nearest<span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token number">1e</span><span class="token operator">-</span><span class="token number">8</span><span class="token punctuation">)</span> <span class="token operator">+</span> self<span class="token punctuation">.</span>mu<span class="token punctuation">[</span>u<span class="token punctuation">]</span> <span class="token keyword">def</span> <span class="token function">_pred</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> u<span class="token punctuation">,</span> i<span class="token punctuation">,</span> normalized <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">if</span> self<span class="token punctuation">.</span>uuCF<span class="token punctuation">:</span> <span class="token keyword">return</span> self<span class="token punctuation">.</span>pred<span class="token punctuation">(</span>u<span class="token punctuation">,</span> i<span class="token punctuation">,</span> normalized<span class="token punctuation">)</span> <span class="token keyword">return</span> self<span class="token punctuation">.</span>pred<span class="token punctuation">(</span>i<span class="token punctuation">,</span> u<span class="token punctuation">,</span> normalized<span class="token punctuation">)</span> |
Lưu ý, với
uuCF = 0
, mình sẽ thực hiện hàm đổi chỗ 2 tham số u và i khi thực hiện hàmpred
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | <span class="token keyword">def</span> <span class="token function">recommend</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> u<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">if</span> self<span class="token punctuation">.</span>uuCF<span class="token punctuation">:</span> ids <span class="token operator">=</span> np<span class="token punctuation">.</span>where<span class="token punctuation">(</span>self<span class="token punctuation">.</span>Y<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">==</span> u<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span>astype<span class="token punctuation">(</span><span class="token builtin">int</span><span class="token punctuation">)</span> items_rated_by_user <span class="token operator">=</span> self<span class="token punctuation">.</span>Y<span class="token punctuation">[</span>ids<span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">.</span>tolist<span class="token punctuation">(</span><span class="token punctuation">)</span> n <span class="token operator">=</span> self<span class="token punctuation">.</span>items_count <span class="token keyword">else</span><span class="token punctuation">:</span> ids <span class="token operator">=</span> np<span class="token punctuation">.</span>where<span class="token punctuation">(</span>self<span class="token punctuation">.</span>Y<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">==</span> u<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span>astype<span class="token punctuation">(</span><span class="token builtin">int</span><span class="token punctuation">)</span> items_rated_by_user <span class="token operator">=</span> self<span class="token punctuation">.</span>Y<span class="token punctuation">[</span>ids<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span>tolist<span class="token punctuation">(</span><span class="token punctuation">)</span> n <span class="token operator">=</span> self<span class="token punctuation">.</span>users_count a <span class="token operator">=</span> np<span class="token punctuation">.</span>zeros<span class="token punctuation">(</span><span class="token punctuation">(</span>n<span class="token punctuation">,</span><span class="token punctuation">)</span><span class="token punctuation">)</span> recommended_items <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>n<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">if</span> i <span class="token operator">not</span> <span class="token keyword">in</span> items_rated_by_user<span class="token punctuation">:</span> a<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">=</span> self<span class="token punctuation">.</span>_pred<span class="token punctuation">(</span>u<span class="token punctuation">,</span> i<span class="token punctuation">)</span> <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>a<span class="token punctuation">)</span> <span class="token operator"><</span> self<span class="token punctuation">.</span>limit<span class="token punctuation">:</span> recommended_items <span class="token operator">=</span> np<span class="token punctuation">.</span>argsort<span class="token punctuation">(</span>a<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token operator">-</span><span class="token builtin">len</span><span class="token punctuation">(</span>a<span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token punctuation">]</span> <span class="token keyword">else</span><span class="token punctuation">:</span> recommended_items <span class="token operator">=</span> np<span class="token punctuation">.</span>argsort<span class="token punctuation">(</span>a<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token operator">-</span>self<span class="token punctuation">.</span>limit<span class="token punctuation">:</span><span class="token punctuation">]</span> <span class="token keyword">return</span> recommended_items |
3. Đánh giá thuật toán
Tương tự với Content-base, ở đây mình cũng đánh giá thuật toán bằng RMSE và precision recall . Các bạn có thể tham khảo một chút:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | <span class="token keyword">def</span> <span class="token function">RMSE</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> data_size<span class="token punctuation">,</span> Data_test<span class="token punctuation">,</span> test_size <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span> SE <span class="token operator">=</span> <span class="token number">0</span> n_tests <span class="token operator">=</span> Data_test<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token keyword">for</span> n <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>n_tests<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">if</span> Data_test<span class="token punctuation">[</span>n<span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">==</span> <span class="token number">1681</span><span class="token punctuation">:</span> pred <span class="token operator">=</span> <span class="token number">0</span> <span class="token keyword">else</span><span class="token punctuation">:</span> pred <span class="token operator">=</span> self<span class="token punctuation">.</span>_pred<span class="token punctuation">(</span>Data_test<span class="token punctuation">[</span>n<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> Data_test<span class="token punctuation">[</span>n<span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> normalized <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">)</span> SE <span class="token operator">+=</span> <span class="token punctuation">(</span>pred <span class="token operator">-</span> Data_test<span class="token punctuation">[</span>n<span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token operator">**</span><span class="token number">2</span> RMSE <span class="token operator">=</span> np<span class="token punctuation">.</span>sqrt<span class="token punctuation">(</span>SE<span class="token operator">/</span>n_tests<span class="token punctuation">)</span> <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">'%s::%d::%d::cosine_similarity::%r::%rrn'</span> <span class="token operator">%</span> <span class="token punctuation">(</span><span class="token builtin">str</span><span class="token punctuation">(</span>data_size<span class="token punctuation">)</span><span class="token punctuation">,</span> self<span class="token punctuation">.</span>uuCF<span class="token punctuation">,</span> self<span class="token punctuation">.</span>k<span class="token punctuation">,</span> test_size<span class="token punctuation">,</span> RMSE<span class="token punctuation">)</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>f<span class="token punctuation">.</span>write<span class="token punctuation">(</span><span class="token string">'%s::%d::%d::cosine_similarity::%r::%rrn'</span> <span class="token operator">%</span> <span class="token punctuation">(</span><span class="token builtin">str</span><span class="token punctuation">(</span>data_size<span class="token punctuation">)</span><span class="token punctuation">,</span> self<span class="token punctuation">.</span>uuCF<span class="token punctuation">,</span> self<span class="token punctuation">.</span>k<span class="token punctuation">,</span> test_size<span class="token punctuation">,</span> RMSE<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">def</span> <span class="token function">evaluate</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> data_size<span class="token punctuation">,</span> Data_test<span class="token punctuation">,</span> test_size <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span> sum_p <span class="token operator">=</span> <span class="token number">0</span> n <span class="token operator">=</span> self<span class="token punctuation">.</span>users_count <span class="token keyword">if</span> self<span class="token punctuation">.</span>uuCF <span class="token keyword">else</span> self<span class="token punctuation">.</span>items_count self<span class="token punctuation">.</span>Pu <span class="token operator">=</span> np<span class="token punctuation">.</span>zeros<span class="token punctuation">(</span><span class="token punctuation">(</span>n<span class="token punctuation">,</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">for</span> u <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span>n<span class="token punctuation">)</span><span class="token punctuation">:</span> recommended_items <span class="token operator">=</span> self<span class="token punctuation">.</span>recommend<span class="token punctuation">(</span>u<span class="token punctuation">)</span> ids <span class="token operator">=</span> np<span class="token punctuation">.</span>where<span class="token punctuation">(</span>Data_test<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">==</span> u<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> rated_items <span class="token operator">=</span> Data_test<span class="token punctuation">[</span>ids<span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span> <span class="token keyword">for</span> i <span class="token keyword">in</span> recommended_items<span class="token punctuation">:</span> <span class="token keyword">if</span> i <span class="token keyword">in</span> rated_items<span class="token punctuation">:</span> self<span class="token punctuation">.</span>Pu<span class="token punctuation">[</span>u<span class="token punctuation">]</span> <span class="token operator">+=</span> <span class="token number">1</span> sum_p <span class="token operator">+=</span> self<span class="token punctuation">.</span>Pu<span class="token punctuation">[</span>u<span class="token punctuation">]</span> p <span class="token operator">=</span> sum_p<span class="token operator">/</span><span class="token punctuation">(</span>n <span class="token operator">*</span> self<span class="token punctuation">.</span>limit<span class="token punctuation">)</span> r <span class="token operator">=</span> sum_p<span class="token operator">/</span><span class="token punctuation">(</span>Data_test<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span> <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">'%s::%d::%d::cosine_similarity::%r::%rrn'</span> <span class="token operator">%</span> <span class="token punctuation">(</span><span class="token builtin">str</span><span class="token punctuation">(</span>data_size<span class="token punctuation">)</span><span class="token punctuation">,</span> self<span class="token punctuation">.</span>uuCF<span class="token punctuation">,</span> self<span class="token punctuation">.</span>limit<span class="token punctuation">,</span> p<span class="token punctuation">,</span> r<span class="token punctuation">)</span><span class="token punctuation">)</span> self<span class="token punctuation">.</span>f<span class="token punctuation">.</span>write<span class="token punctuation">(</span><span class="token string">'%s::%d::%d::cosine_similarity::%r::%rrn'</span> <span class="token operator">%</span> <span class="token punctuation">(</span><span class="token builtin">str</span><span class="token punctuation">(</span>data_size<span class="token punctuation">)</span><span class="token punctuation">,</span> self<span class="token punctuation">.</span>uuCF<span class="token punctuation">,</span> self<span class="token punctuation">.</span>limit<span class="token punctuation">,</span> p<span class="token punctuation">,</span> r<span class="token punctuation">)</span><span class="token punctuation">)</span> |
Vậy là kết thúc 2 phần của Neighborhood-based Collaborative Filtering. Còn Matrix Factorization nữa thôi là mình sẽ kết thúc chủ đề này. Hy vọng là mình sẽ hoàn thành được
Dưới đây là link source code và tài liệu tham khảo. Hẹn gặp bạn ở bài viết tiếp theo nhé