Sample uses Spring Batch to process csv order data

Tram Ho

Hello everyone, today I would like to share my introductory experience about Spring Batch. Reference: https://spring.io/guides/gs/batch-processing/

I just started to learn about creating Batch using Spring Framework recently. I am writing this blog to reinforce what I have learned and hope to be helpful to you and want to find out.

1. Maths

We assume the problem is that we have an electronics store, daily orders are recorded and logged in csv format by store staff.

However, the system does not yet have the function to import these csv into the database, it needs a batch running at the end of each day to do this job.

1.1. (Input) Content of csv order

orders.csv

The content of this file shows how much each customer bought each item for.

For example, line 1 shows the meaning that the number 1 customer bought the number 1 item named asus notebook on July 7, 2016 at 8:00 for VND 10,000,000.

1.2. (Ouput) Content of statistical data

We save the statistical data as an SQL table, and we expect the data corresponding to orders.csv to look like this:

idcustomer_iditem_iditem_nameitem_pricepurchase_date
firstfirstfirstasus notebook100000002020/07/16
2first213 inch macbook pro120000002020/07/16
32213 inch macbook pro120000007/20 2020
423macbook pro 15 inch150000007/20 2020

The table shows how many items were purchased for each day.

And our job is to create a batch to produce the output corresponding to the above input.

Note: To perform the above task, it is not necessary to use a batch but to use the web function.

However, the purpose is to experience Spring Batch so we assume under the special conditions that we need to use the batch.

2.Prepare

The environment on your host machine is as follows, you can refer to the appropriate for your computer

  • OS: MacOS (Macbook Pro)
  • Java

I also tried it on VS Code but did not find the feature of automatic import (Import Organization) not very effective, so I switched to Eclipse. About the IDE, using IntellJ is probably good, but the Community version does not support Spring so I only.

3.Implement

3.1 Initialization

I will use Spring Initialzr to create boilerplate for the project. https://start.spring.io/

The options

  • Project: Maven
    • Although today Gradle seems to be used more, but for personal reasons I chose Maven
  • Language: Java
    • I choose Java to develop
  • Dependence: Spring Batch + Mysql Driver
    • The main project Dependence is Batch, so we choose Spring Batch
    • The database I use is Mysql so I choose the corresponding Driver
    • Spring Data JPA seems to layer data to help manipulate Mysql Driver more conveniently, I do not understand much, but it seems to follow the recommended tutorial on mysql should be added.
  • Other options are personal

After that we choose Generate, the project compression file is demo.zip will be downloaded.

Go to local, unzip me and put in the corresponding directory, put the terminal manipulation cursor here.

3.2. Create database Mysql with (Docker)

Batch will point to the database with Mysql so I will create this database, I choose Docker to store this database.

If you already have mysql on the host machine, you can skip this step.

3.1.Create docker-compose.yml

3.1.1 Database structure

docker-compose.yml is a docker environment structure file. I created a new project with the following content:

demo/docker-compose.yml

In that basic information to access from the host machine will be:

  • Host: 0.0.0.0
  • Database name: demo
  • User name: demo
  • Password: demo

Directory structure :

3.1.2 Initialize tables

Currently there is no database or table, the batch call will not perform any queries so I create the necessary initial data.

Access from host machine:

Create the necessary table:

So we have finished preparing the database. Next is the main part creating Batch.

3.3.Import project

In 3.1, we created the project named demo, next we will use eclipse import into Maven Project format.

Import> Maven> Existing Maven Project.

So we have finished importing, next we will write the source code for the program.

3.4 Design

The program we build will have the following architecture:

Meaning of components

  • BatchProcessingApplication : The program’s entrypoint, the program will start running from this class (In this there is a main function). With the @SpringBootApplication annotation, we will tell the framework that this is a spring boot app and need to spring autoload the other necessary config classes specifically BatchConfiguration.
  • BatchConfiguration : As the name implies, this class is a configuration description for the program. Our batch will call to the processes in the order described below.
  • (Job) importOrderJob : We can understand that a batch will run multiple jobs, and my program is simple, so there is only one job described in this function with the meaning of importing the csv file into the database.
  • (Step) step1 () : In a job there will be many steps, my program is simple so there is only 1 step named step1, in fact we can have more steps.
  • (ItemReader) reader () : The basic model of a batch is “read input” -> “input processing” -> “output output”, with such a 3-step flow, the reader is the first step “read input”. Step1 is further subdivided into 3 such sub steps and the first sub step is the reader.
  • (ItemProcessor) processor () : This is the input processing from reader (). My program currently has no special processing, but in fact we can have many solutions such as transform data types.
  • (ItemWriter) writer () : After the data has been processed, we will save them into the DB in this function.
  • (Repository) OrderRepository : In Spring to save 1 Entity we will need 1 wrapper including them as Repository, OrderRepository is the wrapper that needs to be created.
  • (Entity) Order : This is the Entity that represents the table we need to store data in, in this case because the content is the same as each field in the input csv, this class is also shared for both reader and Writer to handle.

3.5 Write processing

3.5.1 Create Entity

We will create a class to store each csv record, named Order.class

This class only lists fields for each column in csv as well as in table orders.

Methods only have setter and getter.

com.example.demo.batchprocessing.Order.java

Add @Entity annotation to tell Spring that this is an Entity instance corresponding to a table

Because the table we created is named in the plural form, we need to add the name="orders" argument

3.5.2 Create a Repository

Repository is a structure that implements to store entity in DB in Spring, without Repository, we will not be able to perform operations on Entity that reflect on the DB.

com.example.demo.batchprocessing.OrderRepository

The CrudRepository is the interface when initialized to receive 2 parameters, the first parameter is Entity so we set it to Order, the second parameter is the primary key data type (id), here the type of orders.id is Integer so we set it to Integer .

We also marked @Repository so Spring knows this is a Repository

3.5.3 Create BatchConfiguration

com.example.demo.batchprocessing.BatchConfiguration

  • ItemReader <Order> reader () : We need to read the contents of the csv file so in this case we will choose to create the reader that inherits from the ItemReader interface FlatFileItemReader supports reading file by line.
    • We also describe the source as “orders.csv” (in src / main / resource).
    • We also describe the names for these fields in names ()
    • Lines that read csv will be modeled into the Order class
  • ItemProcessor <Order> processor () : We create a function to handle the content of each order read from the reader. Currently I have no processing should return always read order.
  • ItemWriter <Order> writer () : For each processor-processed order we will write the processor to save them to the database. Spring has provided a class that inherits the convenient ItemWriter interface for this is RepositoryItemWriter , we just need to set it accordingly.

Next we will create a unique job for the program and a unique step for this job.

  • We create importOrderJob to return Job with the settings
    • Set listener to listen for events when job is finished: listener (listener)
    • Call to perform step1
    • The function’s parameters, depending on the content of the function, can provide the necessary parameters by themselves without any spring support. Here I want to ask spring to create a listener, so declare the corresponding parameter to listener.
  • Next create step1 () to make a flow from reader to writer
    • Declare a chunk to handle as 10 records: chunk (10)
    • Declare the reader and processor, writer respectively.

3.5.4 Create JobCompletionNotificationListener

We will create a listener to listen to when the job is done.

com.example.demo.batchprocessing.JobCompletionNotificationListener

The listener I created has only the log function that Job has finished running “!!! FINISHED”, so we only need to declare Logger and implement the method ** afterJob () **.

3.5.5 Create BatchProcessingApplication

The main components of the program have been created, we will create the program’s main EntryPoint.

Usually we will need to declare where the batch config is, calling run batch with such config etc … but thanks to the @SpringBootApplication Spring annotation markup , such jobs have been simplified. The self discoverable spring of the batch is in the same package structure. We just need to declare and run the program SpringApplication.run(BatchProcessingApplication.class, args)) is enough.

3.5.6 Remove classes, handle default

When creating a template from Spring Initatilizr in the existing EntryPoint program and testing the corresponding class for it. If so, when running, the batch will report an error of not selecting the corresponding EntryPoint, so we will adjust and delete this default EntryPoint.

We create a new Unit Test for a simple program:

com.example.demo.batchprocessing.test.DemoApplicationTests.java

Write handle is done here!

4. Run the program

4.1 Start the database

We start the database by docker

4.2 Run the batch via the command line

We can run it through the command line or through the editor, no problem. Through the command line, I run the following:

The first run may take some time to download the related libraries but from the 2nd onwards will be very fast.

The result of his run is as shown below:

Check in the database if the orders have been created.

The records are created the same as the csv content specified. So we understand the batch does the OK job.

5. Result

Through this article I have shared with everyone how I implemented a simple batch program as above, read csv file and insert into the database, hoping to be helpful for you to begin the batch spring boot.

Maybe in the content of the article I have not stated all the content needed to run the program, you can refer to the following repository.

https://github.com/mytv1/sample-spring-batch

The article may have many shortcomings, hoping to receive suggestions from you.

Over.

6. Reference

https://spring.io/guides/gs/batch-processing

https://spring.io/guides/gs/accessing-data-mysql

Share the news now

Source : Viblo