This page describes the different clients supported by HiveServer2.
Version
Introduced in Hive version 0.11. See HIVE-2935.
Beeline – Command Line Shell
HiveServer2 supports a command shell Beeline that works with HiveServer2. It's a JDBC client that is based on the SQLLine CLI (http://sqlline.sourceforge.net/). There’s detailed documentation of SQLLine which is applicable to Beeline as well.
Replacing the Implementation of Hive CLI Using Beeline
The Beeline shell works in both embedded mode as well as remote mode. In the embedded mode, it runs an embedded Hive (similar to Hive CLI) whereas remote mode is for connecting to a separate HiveServer2 process over Thrift. Starting in Hive 0.14, when Beeline is used with HiveServer2, it also prints the log messages from HiveServer2 for queries it executes to STDERR.
In remote mode HiveServer2 only accepts valid Thrift calls – even in HTTP mode, the message body contains Thrift payloads.
Beeline Example
% bin/beeline Hive version 0.11.0-SNAPSHOT by Apache beeline> !connect jdbc:hive2://localhost:10000 scott tiger org.apache.hive.jdbc.HiveDriver !connect jdbc:hive2://localhost:10000 scott tiger org.apache.hive.jdbc.HiveDriver Connecting to jdbc:hive2://localhost:10000 Connected to: Hive (version 0.10.0) Driver: Hive (version 0.10.0-SNAPSHOT) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:10000> show tables; show tables; +-------------------+ | tab_name | +-------------------+ | primitives | | src | | src1 | | src_json | | src_sequencefile | | src_thrift | | srcbucket | | srcbucket2 | | srcpart | +-------------------+ 9 rows selected (1.079 seconds)
Beeline with NoSASL connection
If you'd like to connect via NOSASL mode, you must specify the authentication mode explicitly:
% bin/beeline beeline> !connect jdbc:hive2://<host>:<port>/<db>;auth=noSasl hiveuser pass org.apache.hive.jdbc.HiveDriver
Beeline Commands
Command |
Description |
---|---|
!<SQLLine command> |
List of SQLLine commands available at http://sqlline.sourceforge.net/. Example: |
Beeline Hive Commands
Hive specific commands (same as Hive CLI commands) can be run from Beeline, when the Hive JDBC driver is used.
Use ";
" (semicolon) to terminate commands. Comments in scripts can be specified using the "--
" prefix.
Command |
Description |
---|---|
reset |
Resets the configuration to the default values. |
set <key>=<value> |
Sets the value of a particular configuration variable (key). |
set |
Prints a list of configuration variables that are overridden by the user or Hive. |
set -v |
Prints all Hadoop and Hive configuration variables. |
add FILE[S] <filepath> <filepath>* |
Adds one or more files, jars, or archives to the list of resources in the distributed cache. See Hive Resources for more information. |
add FILE[S] <ivyurl> <ivyurl>* add JAR[S] <ivyurl> <ivyurl>* add ARCHIVE[S] <ivyurl> <ivyurl>* |
As of Hive 1.2.0, adds one or more files, jars or archives to the list of resources in the distributed cache using an Ivy URL of the form ivy://group:module:version?query_string. See Hive Resources for more information. |
list FILE[S] |
Lists the resources already added to the distributed cache. See Hive Resources for more information. (As of Hive 0.14.0: HIVE-7592). |
list FILE[S] <filepath>* |
Checks whether the given resources are already added to the distributed cache or not. See Hive Resources for more information. |
delete FILE[S] <filepath>* |
Removes the resource(s) from the distributed cache. |
delete FILE[S] <ivyurl> <ivyurl>* delete JAR[S] <ivyurl> <ivyurl>* delete ARCHIVE[S] <ivyurl> <ivyurl>* |
As of Hive 1.2.0, removes the resource(s) which were added using the <ivyurl> from the distributed cache. See Hive Resources for more information. |
reload | As of Hive 0.14.0, makes HiveServer2 aware of any jar changes in the path specified by the configuration parameter hive.reloadable.aux.jars.path (without needing to restart HiveServer2). The changes can be adding, removing, or updating jar files. |
dfs <dfs command> |
Executes a dfs command. |
<query string> |
Executes a Hive query and prints results to standard output. |
Beeline Command Options
The Beeline CLI supports these command line options:
Option |
Description |
---|---|
-u <database URL> |
The JDBC URL to connect to. Usage: |
-n <username> |
The username to connect as. Usage: |
-p <password> |
The password to connect as. Usage: |
-d <driver class> |
The driver class to use. Usage: |
-e <query> |
Query that should be executed. Double or single quotes enclose the query string. This option can be specified multiple times. Usage: Support to run multiple SQL statements separated by semicolons in a single query_string: 1.2.0 (HIVE-9877) |
-f <file> | Script file that should be executed. Usage: Version: 0.12.0 (HIVE-4268) |
--hiveconf property=value | Use value for the given configuration property. Properties that are listed in hive.conf.restricted.list cannot be reset with hiveconf (see Restricted List and Whitelist). Usage: Version: 0.13.0 (HIVE-6173) |
--hivevar name=value | Hive variable name and value. This is a Hive-specific setting in which variables can be set at the session level and referenced in Hive commands or queries. Usage: |
--color=[true/false] | Control whether color is used for display. Default is false. Usage: (Not supported for Separated-Value Output formats. See HIVE-9770) |
--showHeader=[true/false] | Show column names in query results (true) or not (false). Default is true. Usage: |
--headerInterval=ROWS | The interval for redisplaying column headers, in number of rows, when outputformat is table. Default is 100. Usage: (Not supported for Separated-Value Output formats. See HIVE-9770) |
--fastConnect=[true/false] | When connecting, skip building a list of all tables and columns for tab-completion of HiveQL statements (true) or build the list (false). Default is true. Usage: |
--autoCommit=[true/false] | Enable/disable automatic transaction commit. Default is false. Usage: |
--verbose=[true/false] | Show verbose error messages and debug information (true) or do not show (false). Default is false. Usage: |
--showWarnings=[true/false] | Display warnings that are reported on the connection after issuing any HiveQL commands. Default is false. Usage: |
--showNestedErrs=[true/false] | Display nested errors. Default is false. Usage: |
--numberFormat=[pattern] | Format numbers using a DecimalFormat pattern. Usage: |
--force=[true/false] | Continue running script even after errors (true) or do not continue (false). Default is false. Usage: |
--maxWidth=MAXWIDTH | The maximum width to display before truncating data, in characters, when outputformat is table. Default is to query the terminal for current width, then fall back to 80. Usage: |
--maxColumnWidth=MAXCOLWIDTH | The maximum column width, in characters, when outputformat is table. Default is 15. Usage: |
--silent=[true/false] | Reduce the amount of informational messages displayed (true) or not (false). It also stops displaying the log messages for the query from HiveServer2 (Hive 0.14 and later) and the HiveQL commands (Hive 1.2.0 and later). Default is false. Usage: |
--autosave=[true/false] | Automatically save preferences (true) or do not autosave (false). Default is false. Usage: |
--outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2] | Format mode for result display. Default is table. See Separated-Value Output Formats below for description of recommended sv options. Usage: Version: dsv/csv2/tsv2 added in 0.14.0 (HIVE-8615) |
--truncateTable=[true/false] | If true, truncates table column in the console when it exceeds console length. Version: 0.14.0 (HIVE-6928) |
--delimiterForDSV= DELIMITER | The delimiter for delimiter-separated values output format. Default is '|' character. Version: 0.14.0 (HIVE-7390) |
--isolation=LEVEL | Set the transaction isolation level to TRANSACTION_READ_COMMITTED Usage: |
--nullemptystring=[true/false] | Use historic behavior of printing null as empty string (true) or use current behavior of printing null as NULL (false). Default is false. Usage: Version: 0.13.0 (HIVE-4485) |
--incremental=[true/false] |
Print output incrementally. When results are large the |
--help |
Display a usage message. Usage: |
Separated-Value Output Formats
Starting with Hive 0.14, there are improved SV output formats available, namely DSV, CSV2 and TSV2. These conform better to standard CSV convention, which adds quotes around a cell value only if it contains special characters (such as the delimiter character or a quote character) or spans multiple lines. These three formats differ only with the delimiter between cells, which is comma for CSV2, tab for TSV2, and configurable for DSV (delimiterForDSV property).
CSV and TSV output formats are maintained for backward compatibility, but beware as they add additional single-quote characters around all cell values contrary to this convention.
HiveServer2 Logging
Starting with Hive 0.14.0, HiveServer2 operation logs are available for Beeline clients. These parameters configure logging:
- hive.server2.logging.operation.enabled
- hive.server2.logging.operation.log.location
- hive.server2.logging.operation.verbose (Hive 0.14 to 1.1)
- hive.server2.logging.operation.level (Hive 1.2 onward)
HIVE-11488 (Hive 2.0.0) adds the support of logging queryId and sessionId to HiveServer2 log file. To enable that, edit/add %X{queryId} and %X{sessionId} to the pattern format string of the logging configuration file.
JDBC
HiveServer2 has a JDBC driver. It supports both embedded and remote access to HiveServer2.
Connection URLs
Connection URL Format
The HiveServer2 URL is a string with the following syntax:
jdbc:hive2://<host1>:<port1>,<host2>:<port2>/dbName;sess_var_list?hive_conf_list#hive_var_list
where
<host1>:<port1>,<host2>:<port2>
is a server instance or a comma separated list of server instances to connect to (if dynamic service discovery is enabled). If empty, the embedded server will be used.dbName
is the name of the initial database.sess_var_list
is a semicolon separated list of key=value pairs of session variables (e.g.,user=foo;password=bar
).hive_conf_list
is a semicolon separated list of key=value pairs of Hive configuration variables for this sessionhive_var_list
is a semicolon separated list of key=value pairs of Hive variables for this session.
Connection URL for Remote or Embedded Mode
The JDBC connection URL format has the prefix jdbc:hive2://
and the Driver class is org.apache.hive.jdbc.HiveDriver
. Note that this is different from the old HiveServer.
- For a remote server, the URL format is
jdbc:hive2://<host>:<port>/<db>
(default port for HiveServer2 is 10000). - For an embedded server, the URL format is
jdbc:hive2://
(no host or port).
Connection URL When HiveServer2 Is Running in HTTP Mode
JDBC connection URL: jdbc:hive2://<host>:<port>/<db>;transportMode=http;httpPath=<http_endpoint>
, where:
<http_endpoint>
is the corresponding HTTP endpoint configured in hive-site.xml. Default value iscliservice
.- Default port for HTTP transport mode is 10001.
Versions earlier than 0.14
In versions earlier than 0.14 these parameters used to be called hive.server2.transport.mode
and hive.server2.thrift.http.path
respectively and were part of the hive_conf_list. These versions have been deprecated in favour of the new versions (which are part of thesess_var_list) but continue to work for now.
Connection URL When SSL Is Enabled in HiveServer2
JDBC connection URL: jdbc:hive2://<host>:<port>/<db>;ssl=true;sslTrustStore=<trust_store_path>;trustStorePassword=<trust_store_password>
, where:
- <trust_store_path> is the path where client's truststore file lives.
- <trust_store_password> is the password to access the truststore.
In HTTP mode: jdbc:hive2://<host>:<port>/<db>;ssl=true;sslTrustStore=<trust_store_path>;trustStorePassword=<trust_store_password>;transportMode=http;httpPath=<http_endpoint>
.
For versions earlier than 0.14, see the version note above.
Using JDBC
You can use JDBC to access data stored in a relational database or other tabular format.
-
Load the HiveServer2 JDBC driver. As of 1.2.0 applications no longer need to explicitly load JDBC drivers using Class.forName().
For example:Class.forName("org.apache.hive.jdbc.HiveDriver");
-
Connect to the database by creating a
Connection
object with the JDBC driver.
For example:Connection cnct = DriverManager.getConnection("jdbc:hive2://<host>:<port>", "<user>", "<password>");
The default
<port>
is 10000. In non-secure configurations, specify a<user>
for the query to run as. The<password>
field value is ignored in non-secure mode.Connection cnct = DriverManager.getConnection("jdbc:hive2://<host>:<port>", "<user>", "");
In Kerberos secure mode, the user information is based on the Kerberos credentials.
-
Submit SQL to the database by creating a
Statement
object and using itsexecuteQuery()
method.
For example:Statement stmt = cnct.createStatement(); ResultSet rset = stmt.executeQuery("SELECT foo FROM bar");
- Process the result set, if necessary.
These steps are illustrated in the sample code below.
JDBC Client Sample Code
Running the JDBC Sample Code
Alternatively, you can run the following bash script, which will seed the data file and build your classpath before invoking the client. The script adds all the additional jars needed for using HiveServer2 in embedded mode as well.
JDBC Data Types
The following table lists the data types implemented for HiveServer2 JDBC.
Hive Type |
Java Type |
Specification |
---|---|---|
TINYINT |
byte |
signed or unsigned 1-byte integer |
SMALLINT |
short |
signed 2-byte integer |
INT |
int |
signed 4-byte integer |
BIGINT |
long |
signed 8-byte integer |
FLOAT |
double |
single-precision number (approximately 7 digits) |
DOUBLE |
double |
double-precision number (approximately 15 digits) |
DECIMAL |
java.math.BigDecimal |
fixed-precision decimal value |
BOOLEAN |
boolean |
a single bit (0 or 1) |
STRING |
String |
character string or variable-length character string |
TIMESTAMP |
java.sql.Timestamp |
date and time value |
BINARY |
String |
binary data |
Complex Types |
|
|
ARRAY |
String – json encoded |
values of one data type |
MAP |
String – json encoded |
key-value pairs |
STRUCT |
String – json encoded |
structured values |
JDBC Client Setup for a Secure Cluster
When connecting to HiveServer2 with Kerberos authentication, the URL format is:
jdbc:hive2://<host>:<port>/<db>;principal=<Server_Principal_of_HiveServer2>
The client needs to have a valid Kerberos ticket in the ticket cache before connecting.
NOTE: If you don't have a "/" after the port number, the jdbc driver does not parse the hostname and ends up running HS2 in embedded mode . So if you are specifying a hostname, make sure you have a "/" or "/<dbname>" after the port number.
In the case of LDAP, CUSTOM or PAM authentication, the client needs to pass a valid user name and password to the JDBC connection API.
To use sasl.qop, add the following to the sessionconf part of your Hive JDBC hive connection string, e.g.
jdbc:hive://hostname/dbname;sasl.qop=auth-int
For more information, see Setting Up HiveServer2.
Multi-User Scenarios and Programmatic Login to Kerberos KDC
In the current approach of using Kerberos you need to have a valid Kerberos ticket in the ticket cache before connecting. This entails a static login (using kinit, key tab or ticketcache) and the restriction of one Kerberos user per client. These restrictions limit the usage in middleware systems and other multi-user scenarios, and in scenarios where the client wants to login programmatically to Kerberos KDC.
One way to mitigate the problem of multi-user scenarios is with secure proxy users (see HIVE-5155). Starting in Hive 0.13.0, support for secure proxy users has two components:
- Direct proxy access for privileged Hadoop users (HIVE-5155). This enables a privileged user to directly specify an alternate session user during the connection. If the connecting user has Hadoop level privilege to impersonate the requested userid, then HiveServer2 will run the session as that requested user.
- Delegation token based connection for Oozie (OOZIE-1457). This is the common mechanism for Hadoop ecosystem components.
Proxy user privileges in the Hadoop ecosystem are associated with both user names and hosts. That is, the privilege is available for certain users from certain hosts. Delegation tokens in Hive are meant to be used if you are connecting from one authorized (blessed) machine and later you need to make a connection from another non-blessed machine. You get the delegation token from a blessed machine and connect using the delegation token from a non-blessed machine. The primary use case is Oozie, which gets a delegation token from the server machine and then gets another connection from a Hadoop task node.
If you are only making a JDBC connection as a privileged user from a single blessed machine, then direct proxy access is the simpler approach. You can just pass the user you need to impersonate in the JDBC URL by using the hive.server2.proxy.user= <user> parameter.
See examples in ProxyAuthTest.java.
Support for delegation tokens with HiveServer2 binary transport mode hive.server2.transport.mode has been available starting 0.13.0; support for this feature with HTTP transport mode was added in HIVE-13169, which should be part of Hive 2.1.0.
The other way is to use a pre-authenticated Kerberos Subject (see HIVE-6486). In this method, starting with Hive 0.13.0 the Hive JDBC client can use a pre-authenticated subject to authenticate to HiveServer2. This enables a middleware system to run queries as the user running the client.
Using Kerberos with a Pre-Authenticated Subject
To use a pre-authenticated subject you will need the following changes.
- Add hive-exec*.jar to the classpath in addition to the regular Hive JDBC jars (commons-configuration-1.6.jar and hadoop-core*.jar are not required).
- Add auth=kerberos and kerberosAuthType=fromSubject JDBC URL properties in addition to having the “principal" url property.
- Open the connection in Subject.doAs().
The following code snippet illustrates the usage (refer to HIVE-6486 for a complete test case):
Python Client
A Python client driver is available on github. For installation instructions, see Setting Up HiveServer2: Python Client Driver.
Ruby Client
A Ruby client driver is available on github at https://github.com/forward3d/rbhive.
Integration with SQuirrel SQL Client
- Download, install and start the SQuirrel SQL Client from the SQuirrel SQL website.
- Select 'Drivers -> New Driver...' to register Hive's JDBC driver that works with HiveServer2.
-
Enter the driver name and example URL:
-
-
Select 'Extra Class Path -> Add' to add the following jars from your local Hive and Hadoop distribution.
Version information
Hive JDBC standalone jars are used in Hive 0.14.0 onward (HIVE-538); for previous versions of Hive, use
HIVE_HOME/build/dist/lib/*.jar
instead.The hadoop-common jars are for Hadoop 2.0; for previous versions of Hadoop, use
HADOOP_HOME/hadoop-*-core.jar
instead. -
Select 'List Drivers'. This will cause SQuirrel to parse your jars for JDBC drivers and might take a few seconds. From the 'Class Name' input box select the Hive driver for working with HiveServer2:
-
Click 'OK' to complete the driver registration.
- Select 'Aliases -> Add Alias...' to create a connection alias to your HiveServer2 instance.
- Give the connection alias a name in the 'Name' input box.
- Select the Hive driver from the 'Driver' drop-down.
- Modify the example URL as needed to point to your HiveServer2 instance.
- Enter 'User Name' and 'Password' and click 'OK' to save the connection alias.
- To connect to HiveServer2, double-click the Hive alias and click 'Connect'.
When the connection is established you will see errors in the log console and might get a warning that the driver is not JDBC 3.0 compatible. These alerts are due to yet-to-be-implemented parts of the JDBC metadata API and can safely be ignored. To test the connection enter SHOW TABLES in the console and click the run icon.
Also note that when a query is running, support for the 'Cancel' button is not yet available.
Integration with DbVisSoftware's DbVisualizer
- Download, install and start DbVisualizer free or purchase DbVisualizer Pro from https://www.dbvis.com/.
- Follow instructions on github.
Advanced Features for Integration with Other Tools
Supporting Cookie Replay in HTTP Mode
Version 1.2.0 and later
This option is available starting in Hive 1.2.0.
HIVE-9709 introduced support for the JDBC driver to enable cookie replay. This is turned on by default so that incoming cookies can be sent back to the server for authentication.
The JDBC connection URL when enabled should look like this:
jdbc:hive2://<host>:<port>/<db>?transportMode=http;httpPath=<http_endpoint>;cookieAuth=true;cookieName=<cookie_name>
- cookieAuth is set to
true
by default. - cookieName: If any of the incoming cookies' keys match the value of cookieName, the JDBC driver will not send any login credentials/Kerberos ticket to the server. The client will just send the cookie alone back to the server for authentication. The default value of cookieName is hive.server2.auth (this is the HiveServer2 cookie name).
- To turn off cookie replay, cookieAuth=false must be used in the JDBC URL.
- Important Note: As part of HIVE-9709, we upgraded Apache http-client and http-core components of Hive to 4.4. To avoid any collision between this upgraded version of HttpComponents and other any versions that might be present in your system (such as the one provided by Apache Hadoop 2.6 which uses http-client and http-core components version of 4.2.5), the client is expected to set CLASSPATH in such a way that Beeline-related jars appear before HADOOP lib jars. This is achieved via setting HADOOP_USER_CLASSPATH_FIRST=true before using hive-jdbc. In fact, in bin/beeline.sh we do this!
Using 2-way SSL in HTTP Mode
Version 1.2.0 and later
This option is available starting in Hive 1.2.0.
HIVE-10447 enabled the JDBC driver to support 2-way SSL in HTTP mode. Please note that HiveServer2 currently does not support 2-way SSL. So this feature is handy when there is an intermediate server such as Knox which requires client to support 2-way SSL.
JDBC connection URL:
jdbc:hive2://<host>:<port>/<db>;ssl=true;twoWay=true;
sslTrustStore=<trust_store_path>;trustStorePassword=<trust_store_password>;sslKeyStore=<key_store_path>;keyStorePassword=<key_store_password>
?transportMode=http;httpPath=<http_endpoint>
- <trust_store_path> is the path where the client's truststore file lives. This is a mandatory non-empty field.
- <trust_store_password> is the password to access the truststore.
- <key_store_path> is the path where the client's keystore file lives. This is a mandatory non-empty field.
- <key_store_password> is the password to access the keystore.
For versions earlier than 0.14, see the version note above.
Passing HTTP Header Key/Value Pairs via JDBC Driver
Version 1.2.0 and later
This option is available starting in Hive 1.2.0.
HIVE-10339 introduced an option for clients to provide custom HTTP headers that can be sent to the underlying server (Hive 1.2.0 and later).
JDBC connection URL:
jdbc:hive2://<host>:<port>/<db>;transportMode=http;httpPath=<http_endpoint>;http.header.<name1>=<value1>;http.header.<name2>=<value2>
When the above URL is specified, Beeline will call underlying requests to add an HTTP header set to <name1> and <value1> and another HTTP header set to <name2> and <value2>. This is helpful when the end user needs to send identity in an HTTP header down to intermediate servers such as Knox via Beeline for authentication, for example http.header.USERNAME=<value1>;http.header.PASSWORD=<value2>
.
For versions earlier than 0.14, see the version note above.