How to Install Impala Driver in AWS?

In this blog, we would discuss how to install Impala driver in Linux RHEL EC2 instance.

Impala is a basically a tool as like Hive to perform SQL queries on data residing on HDFS/HBase. The Cloudera ODBC driver for Impala enables users to access Hadoop data through Business Intelligence (BI) applications with ODBC support. ODBC is one of the most established and widely supported APIs for connecting to and working with databases. The driver does this by translating ODBC calls from the application into SQL and passing the SQL queries to the underlying Impala engine.

Now, follow the below steps to install Impala on your EC2

Step 1: Download the Software from https://www.cloudera.com/downloads/connectors/impala/odbc/2-6-0.html

Here, we would install ClouderaImpalaODBC-2.6.0.1000-1.x86_64.rpm package from the official site

Step 2: Login to your EC2 using root & go to /tmp & execute below command

aws s3 cp s3://test-bucket/impala/ClouderaImpalaODBC-2.6.0.1000-1.x86_64.rpm . 

Here, I had this Software downloaded in s3. So I copied the file from s3 to /tmp

Step 3: Install Driver using below command

yum --nogpgcheck localinstall ClouderaImpalaODBC-2.6.0.1000-1.x86_64.rpm

Verify, if the package is installed on the server using

rpm -qa | grep ClouderaImpalaODBC

Step 4: Create the directory if not exists

mkdir -p /opt/odbc_path/client/ODBC_64

Step 5: Go to the directory & create two odbc files (odbc.ini & obcinst.ini) inside it

cd /opt/odbc_path/client/ODBC_64

Create a odbc.ini file with below contents & provide your Host Ip & Port No

 [ODBC Data Sources]
 TESTDSN=Cloudera ODBC Driver for Impala 64-bit
 [TESTDSN]
 Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
 #The DriverUnicodeEncoding setting is only used for SimbaDM
 #When set to 1, SimbaDM runs in UTF-16 mode.
 #When set to 2, SimbaDM runs in UTF-8 mode.
 #DriverUnicodeEncoding=2
 #Values for HOST, PORT, KrbFQDN, and KrbServiceName should be set here.
 #They can also be specified on the connection string.
 HOST=<Host IP>
 PORT=<Port No>
 Database=default
 #The authentication mechanism.
 #0 - No authentication (NOSASL)
 #1 - Kerberos authentication (SASL)
 #2 - Username authentication (SASL)
 #3 - Username/password authentication (NOSASL or SASL depending on UseSASL configuration)
 AuthMech=3
 #Set to 1 to use SASL for authentication.
 #Set to 0 to not use SASL.
 #When using Kerberos authentication (SASL) or Username authentication (SASL) SASL is always used
 and this configuration is ignored. SASL is always not used for No  authentication (NOSASL).
 UseSASL=1
 #Kerberos related settings.
 KrbFQDN=_HOST
 KrbRealm=
 KrbServiceName=impala
 #Username/password authentication with SASL settings.
 UID=
 PWD=
 #Set to 0 to disable SSL.
 #Set to 1 to enable SSL.
 SSL=1
 CAIssuedCertNamesMismatch=1
 TrustedCerts=/opt/cloudera/impalaodbc/lib/64/cacerts.pem
 #General settings
 TSaslTransportBufSize=1000
 RowsFetchedPerBlock=10000
 SocketTimeout=0
 StringColumnLength=32767
 UseNativeQuery=0

Create a odbcinst.ini file with below contents

 [ODBC Drivers]
 Cloudera ODBC Driver for Impala 64-bit=Installed
 [Cloudera ODBC Driver for Impala 64-bit]
 Description=Cloudera ODBC Driver for Impala (64-bit)
 Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
 #The option below is for using unixODBC when compiled with -DSQL_WCHART_CONVERT.
 #Execute 'odbc_config --cflags' to determine if you need to uncomment it.
 #IconvEncoding=UCS-4LE

Step 6: Set the environment variables

export ODBCINI=/opt/odbc_path/client/ODBC_64/odbc.ini
export ODBCINSTINI=/opt/odbc_path/client/ODBC_64/odbcinst.ini
export CLOUDERAIMPALAODBCINI=/opt/cloudera/impalaodbc/lib/64/cloudera.impalaodbc.ini

You could also update the same in your /etc/profile file to avoid each time environment variables set up

Now, you are all set to test the connectivity using pyodbc or impayla python packages to connect to Impala. Below is an example to connect to Impala

import pyodbc
import csv
from datetime import datetime
pyodbc.pooling = False
Configuration settings for the ODBC connection
 try:
     pyodbc.autocommit = True
     cfg = {'DSN': 'TESTDSN', 'host': 'hostname', 'port': portno,  'username': 'username', 'password': 'password','SSL':True,'AllowSelfSignedServerCert':True}
     conString = '''DSN={0};Host={1};Port={2};UID={3};PWD={4};SSL={5};AllowSelfSignedServerCert={6}'''.format(cfg['DSN'], cfg['host'], cfg['port'], cfg['username'], cfg['password'], cfg['SSL'], cfg['AllowSelfSignedServerCert'])
     pyodbc.autocommit = True
   # Create connection 
     conEDL = pyodbc.connect(conString, autocommit=True)   
     print(conEDL) cursor = conEDL.cursor() 
     print(cursor) print('connection established successfully')
 except Exception as err:
     print('Error is :',err)
     print('Failed to establish connection')

I hope, this blog helps to install Impala on AWS platform. Please like and comments if you have any query related to this blog.

Leave a Reply

%d bloggers like this: