Delicious handling of Microsoft Office Word with Python-Docx

Tram Ho

Microsoft Word (MS) – one of the utilities from the Microsoft Office suite is one of the popular software for creating document files, and supports reading and writing content from simple to complex. Although humans can directly create and write content to MS files, many tasks require computers to process and automatically create content on MS files. For example, if you read content from a pdf file and want to convert the content to a docx file or you are developing a natural language processing model and need to read MS files, Python-Docx is one of the The library is worth your choice.

Today I would like to introduce you how to automatically create MS files as well as add, edit, delete content under the support of Python Docx library.

1. Install the library

If you are using anaconda, you can easily install by the following:

2. Initialize the file

To open a file that existed before, use the following command

If this file does not already exist, use the following command:

After you create the file, you can edit the contents of the MS file such as adding paragraphs, adding tables, … through the document variable as in the above example. Once done, save your changes with the following command:

Here, filename is the filename you want to save. Of course the tail is .docx .

3. Heading, title

The python-docx library supports recording the title or heading of the text according to many levels that the user specifies.

  • Content : title or heading content
  • Level : the rank of the heading (0, 1, 2, …). The smaller the number the bigger the font.

3.1. Title

By default python-docx, title has level 0

We have the corresponding result will be:

3.2. Heading

For the heaing parts, we have levels 1, 2, 3 ….

  • Level 1

Corresponding results:

  • Level 2

Corresponding results:

4. Paragraph

In normal types of documents, we have two ways to represent the content of a page:

  • Traditional layout : content is displayed from top to bottom, from left to right
  • Column layout : Content is organized into separate columns

4.1. Traditional layout

With a traditional layout, we write the content to the MS file as follows:

Corresponding results:

4.2. Column layout

To create column form documents. We need to use the concept of Section . Each section can contain many paragraphs. Each section will be equivalent to a page and we pass the section to represent the content for a page.

Create the column layout for the section:

Then we add the paragraphs like a traditional layout. Paragraphs will be added in column order, from left to right.

In addition, we can also add an alignment (left, right, center) to the paragraph as follows:

  • Aligns the left

  • Right margin

  • Center align

  • Side alignment

4.3. Add sentences to the paragraph

The python-docx library supports inserting individual sentences into the generated paragraph:

4.3.1. Highlight background

You can also highlight the background for each sentence with your favorite colors as follows: (highlight is the color name)

4.3.2. Bold, italic, underline

You can also add highlighting by adding bold, italic or underlined as if the user is directly on the MS file.

5. Picture

You can also insert images directly into python-docx by the path to the image file to be inserted or the image has been represented as a matrix. You can also adjust the picture size to match your text.

Epilogue

Python-docx is a powerful library for creating or modifying docx documents. However, to be able to take full advantage of the features of Microsoft Word, you need to dig deeper into the Mircosoft Word api. If you just want to use simple features, python-docx is still a great choice. Thank you for watching my reading

Share the news now

Source : Viblo