Export docx file with python-docx in Django app

Tram Ho

Exporting files is a frequently happen feature where the user could get their data out.

As the backend side, my Django app could help me export a docx file by using a library named python-docx.

Install python-docx and get basic knowledge

Install python-docx is super simple with one command:

pip install python-docx

If you are working on docker and have a requirement.txt file where you add all your libs on it, don’t forget to add “python-docx” version to it.

You could get exactly the version installed by calling the command “pip freeze” to get the list of libs name and version, then find out the python-docx one.

python-docx==0.8.10

Before going into detail about the way we create a view to export docx file by python-docx. It’s will help if you could take a look at the official document of python-docx first.

Make the view to get export docx file

As usual, to create an API for download, we write a view with allowing GET method only.

After creating an empty document, we need to save them and send it to the response. Python-docx has Document.save() method which accepts a stream instead of a file name. Thus, we could initialize an io.BytesIO() object to save the document info, then dump that to the user.

We use StreamingHttpResponse to load heavy data and use content_type “application/vnd.openxmlformats-officedocument.wordprocessingm” for docx file.

At this point, we could download .docx file name Test.docx and see empty content in it.

Build detailed content to the document

After step downloading an empty docx file, we start on building the content for the docx. Please follow the document of python-docx.

Basically, you could add header text by use “document.add_heading()” method and some paragraphs by using “document.add_paragraph()” method.

If you want to style the text, you could add_run to a paragraph.

For example, I created a build_document() method where build all content in document:

then I will replace creating an empty document in the view by:

document = self.build_document()

And here is the result export for now:

Build html content by using HTMLParser within document

Basically, I could export docx file within content in it. Fistly, I am simply add it within the paragraph:

But there was weird display when I have a field save as html.

So, I need to figure out a way to convert HTML content to text and keep it basic style like italic, bold or bullet points, like so:

After some researches, I know there is a python lib called “html.parser — Simple HTML and XHTML parse” and follow an example to create a class named DocumentHTMLParser to handle it like this:

Above code, in general, we go override function in class HTMLParser then use run of the paragraph to custom style by the tag start on. If the tag needs to break on end then we adding a break for it.

Then I use this custom class in my view to handle the html content:

And this is the result on docx of the html content:

Exporting in a Django app is interesting and Python also has many useful libraries for handling the content format.

We just go through a simple example on exporting docx files within a Django app. If you know another better way and/or anything looks not quite right, please let me know in the comment.

As usual, the origin of this post is on my personal blog

Thank you,

BeautyOnCode

Share the news now

Source : Viblo