Collecting Data For “Deep Learning” “About Imagenet

Tram Ho

Dataset Acquisition for Deep Learning

 

Computer Vision (CV) is a science with the purpose of helping computers identify, understand and process image and video data from which to extract information and data as required. CVs have made great strides thanks to Deep Learning, IoT and Cloud Computing, today’s models have high accuracy and reliability in many problems such as image classification, classification, and zoning. object detection, segmentation and generation problems. This is the premise for building and developing AI applications in the future.

Basically, the steps to solve a Deep Learning problem in CV are as follows:

1. Collect data for the problem. (Collecting Dataset)

2. The problems in CV are mostly Supervised Learning so it is necessary to label the collected data (Labeling Dataset).

3. Choose the Deep Learning models that are suitable for the problem-> Conduct training -> Conduct tests and assessments (Test and Evaluate Model)

4. Repeat the above steps until Satisfying acceptable quality requirements are met.

We mostly focus on step 3 of the problem, that is, in choosing models and methods to improve model hyperparameters (based on metrics, optimizers, activation functions …) in order to achieve the lowest error rate ( finding a better algorithm to make a better decisions).

An important thing that is often overlooked when starting a CV Deep Learning problem is the data collection step to train the model. The simple example is labeled-trainning images (besides videos, there are also videos, 3D point cloud) for binary, multiclass, or multi-label classification problems , i.e. finding a set of labeled images belonging to Different classes (for example, labels like dogs, cats, chickens, etc.). For the most part, when we study CV or Deep Learning, we work on pre-labeled labeled collections. But in reality, depending on the problem, you need to collect different data and rarely correlate with the available datasets!

Deep Learning models cannot work without data, if the dataset is too small, it can lead to overfitting and the model cannot fully learn the features for the overview or in other words, the model lacks the ability to generalize. – generalization. Another way is to use data augmentation for existing training data, but the problem is that generated images are more or less “remixed” from existing pixels (still inter-correlated), so the model is also difficult to escape overfitting (need to apply more many other methods like weight decay, dropout …). So how to find “enough” training data and label it? It can be said that this is the most laborious job in CV Deep Learning – an expensive task! In this article I will mention some techniques to find and collect a training dataset that is “delicious” enough for you to conduct training and test your model. In addition, I will talk about a very common dataset in the Deep Learning community, that is ImageNet.

Note: After collecting data, it is necessary to pre-process because most of the collected data are raw data with different height, width, ratio … so it cannot be taken directly into Deep Learning Models. ! Often we will use the built-in libraries like OpenCV, Scikit-Image … to preprocessing the image.

“Data is like garbage. You’d better know what you are going to do with it before you collect it. ” – Mark Twain

Step 1: Collecting data (Collecting Dataset)

First we need to understand and understand what is the problem to be solved? Its business value makes it possible to find exact training data for the problem! With classification problems, you can use the names of classes to create keywords and use crawling data tools from the Internet to find images. Or you can find photos, videos from social networking sites, satellite images on Google, free collected data from public cameras or cars (Waymo, Tesla), or even buy data from 3rd parties (note: accuracy of the data). It is also important to know that the aggregated datasets may be relevant to our problem, below are some common dataset:

· Common Objects in Context (COCO)

· ImageNet

· Google’s Open Images

· KITTI

· The University of Edinburgh School of Informatics’ CVonline: Image Databases

Yet Another Computer Vision Index To Datasets (YACVID)

· Mldata.io

· CV datasets on GitHub

· ComputerVisionOnline.com

· Visualdata.io

· Mighty.ai

· UCI Machine Learning Repository

· Udacity Self driving car datasets

· Cityscapes Dataset

· Autonomous driving dataset by Comma.ai

· MNIST handwritten datasets

Step 2: Label data

This is an important step because it will evaluate whether our model works well or not! Wrong labeling of data will make the model predict and misjudge -> waste of time and effort for training. There are two issues to note:

· How to label data?

· Who will label the data?

We will go through each issue in turn!

Problem 1 – How to label data? After finding the dataset for the problem, what type of problem should be determined? For example classification, object detection, segmentation … From there can proceed to process data to label accordingly! In the case of classification , labels are the keywords used in the process of finding and crawling data from the Internet. Instance instance segmentation needs a label for each pixel of the image. Now we need to use tools to perform image annotation (ie set label and metadata for images). Common tools can be named Comma Coloring, Annotorious, LabelMe … These tools will support the GUI for labeling each segment of the image. For example:

Source: http://labelme.csail.mit.edu/Release3.0/

However, this job is quite manual and time consuming! Another faster way is by using algorithms like Polygon-RNN ++ ( https://arxiv.org/abs/1803.09693 ) or Deep Extreme Cut ( https://arxiv.org/abs/1711.09081) . Polygon-RNN ++ takes input as object in image and outputs output as polygon points surrounding the object to shape segments in image, making it easier to type labels. The working principle of Deep Extreme Cut is similar to Polygon-RNN ++ but with a polygon number of 4. More details can be found here https://www.youtube.com/watch?v=evGqMnL4P3E

It is also possible to use the tranfer learning method to tag data. By using pre-trained models on large-scale dataset such as ImageNet, Open Images. Pre-trained models have “learned” a lot of features from millions of different images, so the accuracy is quite high. Based on these models, we can find and label the bounding boxes of each object in the image. It should be noted that these pre-trained models must be similar to the collected dataset in order to be able to perform feature-extraction or fine-turning .

“The snowball effect” – after using the above data to train the model for the problem, we can reuse this model to label the new data.

Problem 2 – Who will label the data? There are 2 different types:

In-house : you will be the labeler yourself, or ask your relatives and friends to help! If there is more abundance, you can create a team to conduct data labeling. Pros: easy to control the accuracy of data, low cost. Cons: It takes a lot of time to collect and label data.

Out-source : Thanks to third parties, it can be companies and services that specialize in providing data according to business requirements. Pros: data has the ability to aggregate quickly. Disadvantages: data needs to be transparent and accurate, costly!

We can also use online workforce resources like Amazon Mechanical Turk ( https://www.mturk.com/ ) or Crowdflower ( http://www.crowdflower.com/ ). In short, thanks to the online community labeling data to help me, there is usually a fee. This is also the way big datasets like ImageNet or Microsoft Coco were born. However, the accuracy and organization of the data is an issue we need to consider.

Depending on the conditions and requirements of each problem, you need to choose the appropriate options!


Brief Introduction to ImageNet

In this section I will briefly go over ImageNet, dataset is very popular in Deep Learning CV with over 14 million images and more than 20,000 different classes. The difficulties from the beginning of the project, the way it was formed and born to serve the CV with the purpose of “We’re going to map out the entire world of objects.”

1. What is ImageNet?

ImageNet was originally a project launched with the purpose of collecting, labeling and classifying pictures into different “synset sets” according to the structure of WordNet (a large lexical database of English).

The project was started in 2006 by Ms. Fei-Fei Li – a chief scientist at Google Cloud, a professor at Stanford, and director of the university’s AI lab.

2. ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

This famous dataset was used in Computer Vision competitions with the name ILSVRC. This competion is held from 2010 to 2017 ( http://image-net.org/challenges/LSVRC/ ) with the purpose of building a model with high accuracy for image classification problem. The winning models in the competition are CNN Deep Learning models (AlexNet, SqueezeNet, VGGNet, GoogLeNet, ResNet, SEnet). This dataset has become a de facto standard for classification algorithms in CVs and opens a new era for Deep Learning.

Within 7 years (2010–2017), the winning accuracy on the ImageNet dataset increased from 71.8% to 97.3%, which could say surpassed human cognitive ability and created a large enough data for algorithms with bigger data – bigger data leads to better decisions!

3. Short history and Challenges

The first ideas of the project started in 2006, at this time Ms. Fei-Fei Li was a professor at the University of Illinois Urbana-Champaign (UIUC). During her research on CV, Ms. Li realized that her models and algorithms could not function well without a sufficiently large dataset, reflecting different aspects of reality. She proposed a solution is to build a dataset with the above properties. “We decided we wanted to do something that was completely historically unprecedented. We’re going to map out the entire world of objects. ” – Ms. Li said.

Ms. Li embarked on the project by learning about different researchs on how to represent data in the real world. In the process, she found the structure of WordNet best suited to her approach. She met with Professor Christiane Fellbaum (Princeton University) to discuss how to build a mapping images system for each category of WordNet.

WordNet is a project about building a hierarchal structure for the English language that started in the mid-1980s by professor George A. Miller (Princeton University). The English vocabulary is arranged like the index of a dictionary, but the words are clearly related and hierarchical. For example, “dog” would be in the “canine” class, and “canine” would be in the “mamal” class.

A month later, Ms. Li participated in the project at Princeton University with two colleagues, Professor Kai Li and assistant professor, Jia Deng. Later Deng replaced Li to continue the ImageNet project from 2017.

Source: http://www.image-net.org/papers/ImageNet_2010.pdf

Ms. Li’s first idea was to hire the university’s students to search for images from the Internet to include in each category of the dataset at a cost of $ 10 per student. But after some mathematical calculations, Ms. Li realized that the project would take at least 19 years to complete.

Source: http://www.image-net.org/papers/ImageNet_2010.pdf

The idea of ​​hiring students was abolished, Ms. Li and her team met to find a more suitable idea. The idea is to use Computer Vision algorithms to find and categorize images from the Internet. But if using CV algorithms then the algorithms will be limited later by the algorithms themselves and lose the overview. The idea of ​​hiring students is time-consuming, using algorithms is not good, the team does not have enough funding, the project seems to come to a standstill – “Li said the project failed to win any of the federal grants she applied for , receiving comments on proposals that it was shameful Princeton would research this topic, and that the only strength of proposal was that Li was a woman ”.

Ms. Li finally found a solution for her own project when a student suggested she used Amazon Mechanical Turk ( https://www.mturk.com/ ) – a crowdsourcing marketplace. This has completely changed the ImageNet project. “He showed me the website, and I can tell you literally that day I knew the ImageNet project was going to happen.” – Ms. Li said. “Suddenly we found a tool that could scale, that we could not possibly dream of by hiring Princeton undergrads.”

Source: http://www.image-net.org/papers/ImageNet_2010.pdf

But using Amazon Mechanical Turk also brings certain difficulties in the accuracy and structure of the system. How many Turkers must I first hire to find and categorize each image? In case of conflict about determining object in image, how should we solve it? How does validation mechanism need to build to ensure accuracy? What happens when Turkers want to cheat and sabotage a project? These are just a few of the difficulties Li’s team faces. Ms. Li and her colleagues gave the solutions to the above problems in the paper ” ImageNet: A Large-Scale Hierarchical Image Database ” ( http://www.image-net.org/papers/imagenet_cvpr09.pdf ) by how to build voting mechanisms, enhancements, quiz and statistic models.

After using Amazon Mechanical Turk, the ImageNet project took about 3 years to complete. The dataset originally consisted of 3.2 million images, divided into 5247 categories and 12 sub-trees such as “mammal,” “vehicle,” and “furniture.” In 2009, Ms. Li and her team published the ImageNet paper “ ImageNet: A Large-Scale Hierarchical Image Database ” and announced her research in Conference on Computer Vision and Pattern Recognition (CVPR). ImageNet and ILSVRC have since ushered in a new era for CV and Deep Learning.

“One thing ImageNet changed in the field of AI is suddenly people realized the thankless work of making a dataset was at the core of AI research,” Ms. Li said. “People really recognize the importance the dataset is front and center in the research as much as algorithm.”

Conclusion

In this article, I have introduced you to some techniques to find and collect dataset, for training Deep Learning models. You need to pay attention to the accuracy and structure of the dataset to be able to build a model with the highest accuracy. In addition, you also learned more about ImageNet, a very common dataset in Deep Learning, from the initial idea with the way it was formed and the challenges during the build process. With this knowledge and understanding, I believe that you will know how to build a good dataset to be able to create the best Deep Learning models! Happy learning!

Share the news now

Source : Techtalk