After finishing the series of basic operations with tables in Hive, let’s come to the tutorial on partitioning in Hive, this article uses basic steps to manipulate tables so it is quite easy, along with Watch it.
Hive organizes tables into partitions. It is a way of dividing the table into related parts based on the values of the partitioned columns like date, city and department. Using partition, easy to query part of data.
The tables or partitions are divided into buckets, to provide additional structure for data that can be used for more efficient queries. Buckets work based on the hash value of some table columns.
For example, a table named Tab1 contains employee data such as id, name, dept, and yoj (year of joining). Suppose you need to retrieve the details of all employees who participated in 2012. A query searches the entire table for required information. However, if you partition employee data by year and store it in a separate file, this will reduce query processing time. The following example shows how to partition a file and its data:
The following file contains employeedata table. / tab1 / employeedata / file1
1 2 3 4 5 6 | id, name, dept, yoj 1, gopal, TP, 2012 2, kiran, HR, 2012 3, kaleel,SC, 2013 4, Prasanth, SC, 2013 |
The above data is partitioned into 2 files using year
/ tab1 / employeedata / 2012 / file2
1 2 3 | 1, gopal, TP, 2012 2, kiran, HR, 2012 |
/ tab1 / employeedata / 2013 / file3
1 2 3 | 3, kaleel,SC, 2013 4, Prasanth, SC, 2013 |
1. Add a partition
We can add partitions to the table by changing the table. Suppose we have a table called employee with fields like Id, Name, Salary, Designation, Dept, and yoj.
Syntax:
1 2 3 4 5 6 | ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...; partition_spec: : (p_column = p_col_value, p_column = p_col_value, ...) |
The following query is used to add a partition to the employee table.
1 2 3 4 | hive> ALTER TABLE employee > ADD PARTITION (year=’2012’) > location '/2012/part2012'; |
2. Rename the partition
Syntax:
1 2 | ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec; |
The query is used to rename a partition:
1 2 3 | hive> ALTER TABLE employee PARTITION (year=’1203’) > RENAME TO PARTITION (Yoj=’1203’); |
3. Delete a partition
Syntax delete 1 partition:
1 2 | ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,...; |
The following query is used to delete a partition:
1 2 3 | hive> ALTER TABLE employee DROP [IF EXISTS] > PARTITION (year=’1203’); |
That’s all. One of the shortest tracks of the Hive series. Wishing everyone a happy weekend ^^.