1. Built-in Functions
Hive supports the following built-in functions:
Return type | Syntax | Description |
---|---|---|
BIGINT | round (double a) | Returns the rounding BIGINT value of double |
BIGINT | floor (double a) | Returns the maximum value BIGINT equal to or less than double |
BIGINT | ceil (double a) | Returns the smallest value BIGINT equal to or greater than double |
double | rand (), rand (int seed) | Returns a random value that varies from row to row |
string | concat (string A, string B, …) | Returns the result of concatenating strings, B after A |
string | substr (string A, int start) | Returns the substring of A starting from the starting position until the end of string A. |
string | substr (string A, int start, int length) | Returns the substring of A starting at the starting position of length |
string | upper (string A) | Returns the result string from converting all the letters of A to uppercase |
string | ucase (string A) | Same as above |
string | lower (string A) | Returns the result string from converting all the characters of A to lowercase |
string | lcase (string A) | Same as above |
string | trim (string A) | Returns the result string from cutting space characters from both ends of A. |
string | ltrim (string A) | Returns the result string from truncating space characters from the left end of A |
string | rtrim (string A) | Returns the string resulting from truncating space characters from the right end of A |
string | regexp_replace (string A, string B, string C) | It returns the result string from replacing all substrings in B that match the Java regular expression syntax with C. |
int | size (Map <KV>) | Returns the number of elements in map type |
int | size (Array <T>) | Returns the number of elements in the array type |
value of | cast (<expr> as <type>) | It converts the result of the expression expr into <type> eg cast (‘1’ as BIGINT) converts the string ‘1’ into an integral representation. A NULL is returned if the conversion fails. |
string | from_unixtime (int unixtime) | Converts the number of seconds in Unix time (landmark 1970-01-01 00:00:00 UTC) to a string representing the current timezone current time as “1970-01-01 00:00:00” |
string | to_date (string timestamp) | Returns the date part of the string: to_date (“1970-01-01 00:00:00”) = “1970-01-01” |
int | year (string date) | Returns the fifth of date: year (“1970-01-01 00:00:00”) = 1970, year (“1970-01-01”) = 1970 |
int | month (string date) | Returns the month part of date: month (“1970-11-01 00:00:00”) = 11, month (“1970-11-01”) = 11 |
int | day (string date) | Returns the day of the day: day (“1970-11-01 00:00:00”) = 1, day (“1970-11-01”) = 1 |
string | get_json_object (string json_string, string path) | Extract json object from json string based on the specified json path and return json string of extracted json object. It returns NULL if the input json string is invalid. |
For example
The following query illustrates some of the built-in built-in functions:
round ()
1 2 | hive> SELECT round(2.6) from temp; |
When the query is successfully executed, you will see the following response:
1 2 | 3.0 |
floor ()
1 2 | hive> SELECT floor(2.6) from temp; |
When the query is successfully executed, you will receive the following response:
1 2 | 2.0 |
ceil ()
1 2 | hive> SELECT ceil(2.6) from temp; |
When the query is successfully executed, you will see the following response:
1 2 | 3.0 |
2. Aggregate Functions
Hive supports the following available aggregate functions. Using these functions is similar to aggregates in SQL
Return type | Syntax | Description |
---|---|---|
BiGINT | count (*), count (expr), | count (*) – Returns the total number of columns retrieved |
DOUBLE | sum (col), sum (DISTINCT col) | Returns the sum of the members of the group or the sum of the individual values in the group’s column |
DOUBLE | avg (col), avg (DISTINCT col) | Returns the average of the elements in the group or the average of the individual values of the column in the group |
DOUBLE | min (col) | Returns the smallest value of the column in the group |
DOUBLE | max (col) | Returns the highest value of the column in the group |