Partitioning in Hive

Tram Ho

After finishing the series of basic operations with tables in Hive, let’s come to the tutorial on partitioning in Hive, this article uses basic steps to manipulate tables so it is quite easy, along with Watch it.

Hive organizes tables into partitions. It is a way of dividing the table into related parts based on the values ​​of the partitioned columns like date, city and department. Using partition, easy to query part of data.

The tables or partitions are divided into buckets, to provide additional structure for data that can be used for more efficient queries. Buckets work based on the hash value of some table columns.

For example, a table named Tab1 contains employee data such as id, name, dept, and yoj (year of joining). Suppose you need to retrieve the details of all employees who participated in 2012. A query searches the entire table for required information. However, if you partition employee data by year and store it in a separate file, this will reduce query processing time. The following example shows how to partition a file and its data:

The following file contains employeedata table. / tab1 / employeedata / file1

The above data is partitioned into 2 files using year

/ tab1 / employeedata / 2012 / file2

/ tab1 / employeedata / 2013 / file3

1. Add a partition

We can add partitions to the table by changing the table. Suppose we have a table called employee with fields like Id, Name, Salary, Designation, Dept, and yoj.


The following query is used to add a partition to the employee table.

2. Rename the partition


The query is used to rename a partition:

3. Delete a partition

Syntax delete 1 partition:

The following query is used to delete a partition:

That’s all. One of the shortest tracks of the Hive series. Wishing everyone a happy weekend ^^.

Share the news now

Source : Viblo