Web Scraping with Scrapy – Part 1

Tram Ho

Who, Machine Learning is a hot trend in the era of technology 4.0. In order to work with it, one of the most important things is data, the greater the amount of data and the greater the authenticity, the better for the training. In this article, I would like to introduce how to collect data with Scrapy.

Setting

The first requirement to use Scrapy is to install Python3 and Scrapy (of course ^^).

Python3

1.Open the terminal and enter the command

2.Install python 3.6 with the command

Scrapy

1.Install Scrapy by command

Write the program first

In this article I will use the website https://9to5mac.com/ to demo for Scraping.

Initialize Project

1. Open a terminal, initialize Project Scrapy first with the command

2. Use cd to point to Project, then initialize Spider with command

Here I put it:

Write code for the first Spider

1. Use the IDE to open the project, open the Spider file in the spiders folder.

2. To begin, I will import the Request object with the syntax:

3. Start the request by initializing the start_requests method.

The parse_info callback function will contain the object that is the response of the html page.

Create the parse_info method

We then run Spider using the command:

As we can see in the image below, the data has been scanned

Conclude

Okay my friends, through this article I introduced through Scrapy, the next article I will introduce its underlying engine, thank you for your interest.

Share the news now

Source : Viblo