Theory
What is ANNOY?
Tree-based algorithms are one of the most used things when it comes to ANN (Artificial Neural Networks). We construct forests from data by refactoring it into data subsets. One of the most outstanding solutions is Annoy.
Annoy: Approximate Nearest Neighbors Oh Yeah is a C ++ library with Python constraints to search for points in space close to a given query point.
Nearest neighbors and vector pattern
The figure above shows a set of two-dimensional points, but in reality most vector models have more than two dimensions. Our goal here is to build a data structure that allows us to find the points closest to any query point in linear time. We will build a tree that can query with a complexity of O (log n). That’s the way Annoy works. In fact, it’s a binary tree where each node is a random split.
First we will choose 2 random points on the plane and then divide the plane into two parts from those two random points (Figure above).
And keep going from the next 2 random points we divide into 2 planes, and so on until there are maximum K items in each node with the image above we choose K = 10
With the plane above we have the corresponding binary tree (above), we end up with a binary tree with explicit partitions with points being each node. We can see the points near each other in space on the plane with k = 10 being very close to each other in the tree. so we can find out which side of the plane we need to continue and that determines if we go down the left or right child. We then sort all the nodes by distance and return the nearest K neighbors. And that’s how the search algorithm works in Annoy.
Annoy will perform well if there are more trees, with the way to add more trees, we will have the opportunity to find the most favorable splits.
The annoying “A” for ” approximate ” means that the approximation also makes sense while the search is lacking some acceptable points. The whole idea behind the approximate algorithm is to sacrifice a bit of accuracy in exchange for greater performance.
Library Face Recognition
A very famous face recognition library and quite good accuracy built on python’s dlib and written in C ++. The accuracy of the model on Labeled Faces in the Wild is 99.38%.
Github : Here
API : Here
Install : pip install face_recognition
Practice
Step 1: Prepare the data
First, we will prepare the image data set with structure like this, in each folder with their own images.
Step 2: Save data on annoy
We will create an annoy_save.py file to save the images in these directories into annoy:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | NUMBER_OF_TREES <span class="token operator">=</span> <span class="token number">100</span> f <span class="token operator">=</span> <span class="token number">128</span> t <span class="token operator">=</span> AnnoyIndex <span class="token punctuation">(</span> f <span class="token punctuation">,</span> <span class="token string">'angular'</span> <span class="token punctuation">)</span> imagePaths <span class="token operator">=</span> <span class="token builtin">list</span> <span class="token punctuation">(</span> paths <span class="token punctuation">.</span> list_images <span class="token punctuation">(</span> <span class="token string">'/home/nguyen.trung.son/Documents/3D_Sun/3d_project/images'</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token keyword">def</span> <span class="token function">image_encoding</span> <span class="token punctuation">(</span> imagePath <span class="token punctuation">)</span> <span class="token punctuation">:</span> img <span class="token operator">=</span> face_recognition <span class="token punctuation">.</span> load_image_file <span class="token punctuation">(</span> imagePath <span class="token punctuation">)</span> img_ <span class="token operator">=</span> face_recognition <span class="token punctuation">.</span> face_locations <span class="token punctuation">(</span> img <span class="token punctuation">)</span> top <span class="token punctuation">,</span> right <span class="token punctuation">,</span> bottom <span class="token punctuation">,</span> left <span class="token operator">=</span> <span class="token punctuation">[</span> v <span class="token keyword">for</span> v <span class="token keyword">in</span> img_ <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token punctuation">]</span> face <span class="token operator">=</span> img <span class="token punctuation">[</span> top <span class="token punctuation">:</span> bottom <span class="token punctuation">,</span> left <span class="token punctuation">:</span> right <span class="token punctuation">]</span> img_emb <span class="token operator">=</span> face_recognition <span class="token punctuation">.</span> face_encodings <span class="token punctuation">(</span> face <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token keyword">return</span> img_emb <span class="token keyword">for</span> i <span class="token punctuation">,</span> imagePath <span class="token keyword">in</span> tqdm <span class="token punctuation">(</span> <span class="token builtin">enumerate</span> <span class="token punctuation">(</span> imagePaths <span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token punctuation">:</span> img_emb <span class="token operator">=</span> image_encoding <span class="token punctuation">(</span> imagePath <span class="token punctuation">)</span> t <span class="token punctuation">.</span> add_item <span class="token punctuation">(</span> i <span class="token punctuation">,</span> img_emb <span class="token punctuation">)</span> t <span class="token punctuation">.</span> build <span class="token punctuation">(</span> NUMBER_OF_TREES <span class="token punctuation">)</span> <span class="token comment"># 100trees</span> t <span class="token punctuation">.</span> save <span class="token punctuation">(</span> <span class="token string">'images.ann'</span> <span class="token punctuation">)</span> |
Expain code:
1 2 | t = AnnoyIndex(f, 'angular'): hàm này trả về một index mới với f là số chiều của vector đó với metric là "angular" |
1 2 | img = face_recognition.load_image_file(imagePath): dung thư viện face_recognition để load images thay cho opencv |
1 2 | img_ = face_recognition.face_locations(img): Định vị và trả về các tọa độ của faces trong ảnh |
1 2 | img_emb = face_recognition.face_encodings(face)[0]: encode ảnh về vector 128 chiều |
1 2 | t.add_item(i, img_emb): thêm item i với vector v |
1 2 | t.build(NUMBER_OF_TREES): Xây dựng một rừng với 100 trees, càng nhiều cây thì cho cho độ chính xác khi truy vấn càng cao. Sau khi hàm build được gọi thì sẽ không thể add thêm item nào vào được nữa. |
1 2 | t.save('images.ann'): Lưu tất cả các index vào file images.ann |
Step 3: Load data from annoy
Once the index file is saved as images.ann, we will start loading from that file:
1 2 3 4 | f <span class="token operator">=</span> <span class="token number">128</span> u <span class="token operator">=</span> AnnoyIndex <span class="token punctuation">(</span> f <span class="token punctuation">,</span> <span class="token string">'angular'</span> <span class="token punctuation">)</span> u <span class="token punctuation">.</span> load <span class="token punctuation">(</span> <span class="token string">'images.ann'</span> <span class="token punctuation">)</span> |
Step 4: Get name
As for Step 1, we have the image folder of each person corresponding to their name for each of those folders, now we will take each person’s name for each of their images in the folder and append to an array, you have If you ask me why, I will say more about this later:
1 2 3 4 5 6 | imagePaths <span class="token operator">=</span> <span class="token builtin">list</span> <span class="token punctuation">(</span> paths <span class="token punctuation">.</span> list_images <span class="token punctuation">(</span> <span class="token string">'path_of_you'</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token keyword">for</span> i <span class="token punctuation">,</span> imagePath <span class="token keyword">in</span> tqdm <span class="token punctuation">(</span> <span class="token builtin">enumerate</span> <span class="token punctuation">(</span> imagePaths <span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token punctuation">:</span> name <span class="token operator">=</span> imagePath <span class="token punctuation">.</span> split <span class="token punctuation">(</span> os <span class="token punctuation">.</span> path <span class="token punctuation">.</span> sep <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token operator">-</span> <span class="token number">2</span> <span class="token punctuation">]</span> known_face_names <span class="token punctuation">.</span> append <span class="token punctuation">(</span> name <span class="token punctuation">)</span> |
Step 5: Face recognition
Yep, by this point we’re about 70% of our work done:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | face_names <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token keyword">while</span> <span class="token boolean">True</span> <span class="token punctuation">:</span> ret <span class="token punctuation">,</span> frame <span class="token operator">=</span> video_capture <span class="token punctuation">.</span> read <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token comment"># resize hình ảnh xuống 1/4 để quá trình nhận dạng mặt nhanh hơn</span> small_frame <span class="token operator">=</span> cv2 <span class="token punctuation">.</span> resize <span class="token punctuation">(</span> frame <span class="token punctuation">,</span> <span class="token punctuation">(</span> <span class="token number">0</span> <span class="token punctuation">,</span> <span class="token number">0</span> <span class="token punctuation">)</span> <span class="token punctuation">,</span> fx <span class="token operator">=</span> <span class="token number">0.25</span> <span class="token punctuation">,</span> fy <span class="token operator">=</span> <span class="token number">0.25</span> <span class="token punctuation">)</span> <span class="token comment"># Chuyển đổi từ BGR sang RGB (mặc định opencv dùng là BGR thay vì RGB)</span> rgb_small_frame <span class="token operator">=</span> small_frame <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> <span class="token punctuation">:</span> <span class="token punctuation">,</span> <span class="token punctuation">:</span> <span class="token punctuation">:</span> <span class="token operator">-</span> <span class="token number">1</span> <span class="token punctuation">]</span> face_names <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token punctuation">]</span> face_locations <span class="token operator">=</span> face_recognition <span class="token punctuation">.</span> face_locations <span class="token punctuation">(</span> rgb_small_frame <span class="token punctuation">)</span> face_encodings <span class="token operator">=</span> face_recognition <span class="token punctuation">.</span> face_encodings <span class="token punctuation">(</span> rgb_small_frame <span class="token punctuation">,</span> face_locations <span class="token punctuation">)</span> <span class="token keyword">for</span> face_encoding <span class="token keyword">in</span> face_encodings <span class="token punctuation">:</span> <span class="token comment">#Lấy index của vector trong annoy </span> matches_id <span class="token operator">=</span> u <span class="token punctuation">.</span> get_nns_by_vector <span class="token punctuation">(</span> face_encoding <span class="token punctuation">,</span> <span class="token number">1</span> <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token comment">#Lấy vector ra từ index tương ứng đã lấy ở trên</span> known_face_encoding <span class="token operator">=</span> u <span class="token punctuation">.</span> get_item_vector <span class="token punctuation">(</span> matches_id <span class="token punctuation">)</span> <span class="token comment">#Hàm này trả về giá trị True or False, nếu giống là True không giống là False </span> compare_faces <span class="token operator">=</span> face_recognition <span class="token punctuation">.</span> compare_faces <span class="token punctuation">(</span> <span class="token punctuation">[</span> known_face_encoding <span class="token punctuation">]</span> <span class="token punctuation">,</span> face_encoding <span class="token punctuation">)</span> name <span class="token operator">=</span> <span class="token string">"unknown"</span> <span class="token keyword">if</span> compare_faces <span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span> <span class="token punctuation">:</span> <span class="token comment">#Lấy tên từ mảng đã tạo bước 3 dựa vào id tương ứng</span> name <span class="token operator">=</span> known_face_names <span class="token punctuation">[</span> matches_id <span class="token punctuation">]</span> face_names <span class="token punctuation">.</span> append <span class="token punctuation">(</span> name <span class="token punctuation">)</span> <span class="token keyword">print</span> <span class="token punctuation">(</span> face_names <span class="token punctuation">)</span> |
After locating the face and the name of the person in the database, we will proceed to show it on the camera:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | <span class="token keyword">for</span> <span class="token punctuation">(</span> top <span class="token punctuation">,</span> right <span class="token punctuation">,</span> bottom <span class="token punctuation">,</span> left <span class="token punctuation">)</span> <span class="token punctuation">,</span> name <span class="token keyword">in</span> <span class="token builtin">zip</span> <span class="token punctuation">(</span> face_locations <span class="token punctuation">,</span> face_names <span class="token punctuation">)</span> <span class="token punctuation">:</span> <span class="token comment"># Lúc đầu ra scale nó xuống nhỏ gấu 4 lần để detect faces tốt hơn, bây giờ ta sẽ nhân trả lại tọa độ gốc cho nó</span> top <span class="token operator">*=</span> <span class="token number">4</span> right <span class="token operator">*=</span> <span class="token number">4</span> bottom <span class="token operator">*=</span> <span class="token number">4</span> left <span class="token operator">*=</span> <span class="token number">4</span> <span class="token comment"># draw line cho face</span> cv2 <span class="token punctuation">.</span> rectangle <span class="token punctuation">(</span> frame <span class="token punctuation">,</span> <span class="token punctuation">(</span> left <span class="token punctuation">,</span> top <span class="token punctuation">)</span> <span class="token punctuation">,</span> <span class="token punctuation">(</span> right <span class="token punctuation">,</span> bottom <span class="token punctuation">)</span> <span class="token punctuation">,</span> <span class="token punctuation">(</span> <span class="token number">0</span> <span class="token punctuation">,</span> <span class="token number">0</span> <span class="token punctuation">,</span> <span class="token number">255</span> <span class="token punctuation">)</span> <span class="token punctuation">,</span> <span class="token number">2</span> <span class="token punctuation">)</span> <span class="token comment"># Draw label cho face</span> cv2 <span class="token punctuation">.</span> rectangle <span class="token punctuation">(</span> frame <span class="token punctuation">,</span> <span class="token punctuation">(</span> left <span class="token punctuation">,</span> bottom <span class="token operator">-</span> <span class="token number">35</span> <span class="token punctuation">)</span> <span class="token punctuation">,</span> <span class="token punctuation">(</span> right <span class="token punctuation">,</span> bottom <span class="token punctuation">)</span> <span class="token punctuation">,</span> <span class="token punctuation">(</span> <span class="token number">0</span> <span class="token punctuation">,</span> <span class="token number">0</span> <span class="token punctuation">,</span> <span class="token number">255</span> <span class="token punctuation">)</span> <span class="token punctuation">,</span> cv2 <span class="token punctuation">.</span> FILLED <span class="token punctuation">)</span> font <span class="token operator">=</span> cv2 <span class="token punctuation">.</span> FONT_HERSHEY_DUPLEX cv2 <span class="token punctuation">.</span> putText <span class="token punctuation">(</span> frame <span class="token punctuation">,</span> name <span class="token punctuation">,</span> <span class="token punctuation">(</span> left <span class="token operator">+</span> <span class="token number">6</span> <span class="token punctuation">,</span> bottom <span class="token operator">-</span> <span class="token number">6</span> <span class="token punctuation">)</span> <span class="token punctuation">,</span> font <span class="token punctuation">,</span> <span class="token number">1.0</span> <span class="token punctuation">,</span> <span class="token punctuation">(</span> <span class="token number">255</span> <span class="token punctuation">,</span> <span class="token number">255</span> <span class="token punctuation">,</span> <span class="token number">255</span> <span class="token punctuation">)</span> <span class="token punctuation">,</span> <span class="token number">1</span> <span class="token punctuation">)</span> output_names <span class="token punctuation">.</span> append <span class="token punctuation">(</span> name <span class="token punctuation">)</span> cv2 <span class="token punctuation">.</span> imshow <span class="token punctuation">(</span> <span class="token string">'Video'</span> <span class="token punctuation">,</span> frame <span class="token punctuation">)</span> <span class="token keyword">if</span> cv2 <span class="token punctuation">.</span> waitKey <span class="token punctuation">(</span> <span class="token number">1</span> <span class="token punctuation">)</span> <span class="token operator">&</span> <span class="token number">0xFF</span> <span class="token operator">==</span> <span class="token builtin">ord</span> <span class="token punctuation">(</span> <span class="token string">'q'</span> <span class="token punctuation">)</span> <span class="token punctuation">:</span> <span class="token keyword">break</span> video_capture <span class="token punctuation">.</span> release <span class="token punctuation">(</span> <span class="token punctuation">)</span> cv2 <span class="token punctuation">.</span> destroyAllWindows <span class="token punctuation">(</span> <span class="token punctuation">)</span> |
Result
While writing an article I do not understand anything, the below cmt okay, if you are not correct, please comment, thank you.
Reference
https://github.com/spotify/annoy
https://github.com/ageitgey/face_recognition
https://www.pyimagesearch.com/2018/09/24/opencv-face-recognition/
https://face-recognition.readthedocs.io/en/latest/readme.html#installation