Handling JSON and XML files in Python

Tram Ho

Hello friends. Today I will introduce to you about Handling JSON and XML files in Python

Handling JSON files

JSON is one of the most popular data exchange file formats today. With a simple and similar architecture to Python’s, manipulating JSON on Python is easy to understand.

Load files from the Internet

Usually JSON data is obtained from another source (like file, internet..) so this chapter will start by showing you how to download a JSON file from the Internet and then parse the downloaded JSON content.

Use the urllib2 module to download files and the json module to encode/decode JSON data.

The above example will query the path https://api.github.com/users/voduytuan/repos to get my Github Repository list in JSON format.

Parsing JSON Data

If you already have JSON data as a string, and want to parse this string as Data, use the following method:

Encoding JSON Data

If you already have a variable and want to encode it into a JSON string, you can use it in the following way:

XML file handling

In this section, we will parse the XML content into data for processing. To process XML, we will use the Beautifulsoup 4 library. This is a library that makes it quick and convenient to deploy html, xml parsing.

Install Beautifulsoup

Installation instructions can be found on the website http://www.crummy.com/software/BeautifulSoup/bs4/doc/#insbeautiful-soup.

On MacOS, it can be installed using pip as follows:

Setting lxml parser

To parse xml from beautifulsoup, I use xml parser called lxml . See installation instructions at http://www.crummy.com/software/BeautifulSoup/bs4/doc/#insa-parser

On MacOS, it can be installed using pip as follows:

Example of parsing XML

Give the following example:

When running, the following screen will be displayed:

Objects of class Soup (BeautifulSoup) will help to access the components of the xml file quickly and conveniently.

In the example there are several ways to access elements such as:

  • findAll() : Returns an array of tags with the desired name
  • find() : Returns the first element with the desired name
  • Direct access through tag names like x.price.string

Parsing HTML

Similar to xml , BeautifulSoup can parse HTML content through constructor and select html at the 2nd parameter.

Below I have introduced to you about Handling JSOn and XML files in Python and some basic examples. If you have any questions, please leave a comment below.




https://linuxconfig.org/how-to-parse-data-from-json-into-python https://docs.python.org/2/library/xml.etree.elementtree.html

Share the news now