How to Install Cloudera Hive in AWS?

In this blog, we would discuss how to install Cloudera Hive on Linux (RHEL) EC2 instance.

Apache Hive is a data warehouse tool built on top of Apache Hadoop for providing data query and analysis. Hive provides a SQL_Like Interface to query data stored in various databases and filesystem that integrate with Hadoop. It enables reading, writing and managing large datasets in distributed storage. Hive queries are converted to a series of jobs that execute on a Hadoop cluster through MapReduce or Apache Spark. It also provides easy, familiar batch processing for Apache Hadoop.

Few keys features of Hive

  • Provide SQL-Like Interface
  • Shared Data Structure
  • Faster Batch Processing

Few common use cases of Hive

  • ETL
  • Data Mining
  • Data Preparation

Follow the below steps to install Cloudera Hive on AWS platform.

Step 1: Download the Software from https://www.cloudera.com/downloads/connectors/hive/odbc/2-5-25.html

Downloaded file name: ClouderaHiveODBC-2.5.25.1020-1.el7.x86_64.rpm

Step 2: Login to EC2 using root and create below directory

mkdir -p /tmp/cloudera-hive && cd /tmp/cloudera-hive

Step 3: Place the rpm package in above path

aws s3 cp s3://test-bucket/Hive/ClouderaHiveODBC-2.5.25.1020-1.el7.x86_64.rpm .

Step 4: Install the package

yum --nogpgcheck localinstall ClouderaHiveODBC-2.5.25.1020-1.el7.x86_64.rpm

Step 5: Verify the package

yum list | grep ClouderaHiveODBC

Step 6: Create the directory if not exists

mkdir -p /opt/odbc_path/client/ODBC_64

Step 7: Go to the directory & create two odbc files (odbc.ini & obcinst.ini) inside it

cd /opt/odbc_path/client/ODBC_64

Create a odbc.ini file with below contents

[ODBC]
#QEWSD=2458358
 InstallDir=/opt/teradata/client/ODBC_64
 Trace=no
 Pooling=yes
 [ODBC Data Sources]
 Teradata ODBC DSN=Teradata Database ODBC Driver 16.20
 [Teradata ODBC DSN]
 Description=Teradata Database ODBC Driver 16.20
 [testdsn]
 Driver=/opt/teradata/client/ODBC_64/lib/tdataodbc_sb64.so
 DBCName=<Your DB End Point>
 MechanismName=LDAP
 Username=<User Name>
 Passowrd=<Password>
 Database=<DB Name>
 AccountString=
 CharacterSet=ASCII
 DatasourceDNSEntries=
 DateTimeFormat=AAA
 DefaultDatabase=
 DontUseHelpDatabase=0
 DontUseTitles=1
 EnableExtendedStmtInfo=1
 EnableReadAhead=1
 IgnoreODBCSearchPattern=0
 LogErrorEvents=0
 LoginTimeout=20
 MaxRespSize=65536
 MaxSingleLOBBytes=0
 MaxTotalLOBBytesPerRow=0
 MechanismName=
 NoScan=0
 PrintOption=N
 retryOnEINTR=1
 ReturnGeneratedKeys=N
 SessionMode=System Default
 SplOption=Y
 TABLEQUALIFIER=0
 TCPNoDelay=1
 TdmstPortNumber=1025
 UPTMode=Not set
 USE2XAPPCUSTOMCATALOGMODE=0
 UseDataEncryption=0
 UseDateDataForTimeStampParams=0
 USEINTEGRATEDSECURITY=0
 UseSequentialRetrievalOnly=0
 UseXViews=0
 [ODBC]
 Trace = 1
 TraceFile =
 [ODBC Data Sources]
 Cloudera Hive 32-bit=Cloudera ODBC Driver for Apache Hive 32-bit
 Cloudera Hive 64-bit=Cloudera ODBC Driver for Apache Hive 64-bit
 [Cloudera Hive 32-bit]
 Description=Cloudera ODBC Driver for Apache Hive (32-bit) DSN
 Driver=/opt/cloudera/hiveodbc/lib/32/libclouderahiveodbc32.so
 HOST=[HOST]
 PORT=[PORT]
 Schema=default
 ServiceDiscoveryMode=0
 ZKNamespace=
 HiveServerType=2
 AuthMech=2
 ThriftTransport=1
 UseNativeQuery=0
 UID=
 KrbHostFQDN=_HOST
 KrbServiceName=hive
 KrbRealm=
 SSL=0
 TwoWaySSL=0
 ClientCert=
 ClientPrivateKey=
 ClientPrivateKeyPassword=
 [Hive]
 Description=Cloudera ODBC Driver for Apache Hive (64-bit) DSN
 Driver=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
 HOST=
 PORT=
 Schema=
 ServiceDiscoveryMode=0
 ZKNamespace=
 HiveServerType=2
 AuthMech=3
 ThriftTransport=1
 UseNativeQuery=0
 UID=
 PWD=
 KrbHostFQDN=_HOST
 KrbServiceName=hive
 KrbRealm=
 SSL=0
 TwoWaySSL=0
 ClientCert=
 ClientPrivateKey=
 ClientPrivateKeyPassword=

Create a odbcinst.ini file with below contents

[ODBC Drivers]
Teradata Database ODBC Driver 16.20=Installed
[Teradata Database ODBC Driver 16.20]
Description=Teradata Database ODBC Driver 16.20
Driver=/opt/teradata/client/ODBC_64/lib/tdataodbc_sb64.so
[ODBC Drivers]
Cloudera ODBC Driver for Apache Hive 32-bit=Installed
Cloudera ODBC Driver for Apache Hive 64-bit=Installed
[Cloudera ODBC Driver for Apache Hive 32-bit]
Description=Cloudera ODBC Driver for Apache Hive (32-bit)
Driver=/opt/cloudera/hiveodbc/lib/32/libclouderahiveodbc32.so
[Cloudera ODBC Driver for Apache Hive 64-bit]
Description=Cloudera ODBC Driver for Apache Hive (64-bit)
Driver=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so

Step 8: Set the environment variables

export ODBCINI=/opt/odbc_path/client/ODBC_64/odbc.ini 
export ODBCINSTINI=/opt/odbc_path/client/ODBC_64/odbcinst.ini

You could also update the same in your /etc/profile file to avoid each time environment variables update

Now, you are all set to test the connectivity using pyodbc python packages to connect to Hive EDL. Below is an example to connect to Hive

import pyodbc
pyodbc.autocommit = True
pyodbc.pooling = False
conn_str = "DSN="+'Hive'+";HOST="+'Hostname'+";UID="+'User_ID'+";PWD="+'Password'+";PORT="+'Port_No'
con = pyodbc.connect(conn_str, autocommit=True)         

I hope, this blog helps to install Cloudera Hive driver on AWS platform. Please like and comments if you have any query related to this blog.

Leave a Reply

%d bloggers like this: