In this blog, we would discuss how to install Impala driver in Linux RHEL EC2 instance.
Impala is a basically a tool as like Hive to perform SQL queries on data residing on HDFS/HBase. The Cloudera ODBC driver for Impala enables users to access Hadoop data through Business Intelligence (BI) applications with ODBC support. ODBC is one of the most established and widely supported APIs for connecting to and working with databases. The driver does this by translating ODBC calls from the application into SQL and passing the SQL queries to the underlying Impala engine.

Now, follow the below steps to install Impala on your EC2
Step 1: Download the Software from https://www.cloudera.com/downloads/connectors/impala/odbc/2-6-0.html
Here, we would install ClouderaImpalaODBC-2.6.0.1000-1.x86_64.rpm package from the official site
Step 2: Login to your EC2 using root & go to /tmp & execute below command
aws s3 cp s3://test-bucket/impala/ClouderaImpalaODBC-2.6.0.1000-1.x86_64.rpm .
Here, I had this Software downloaded in s3. So I copied the file from s3 to /tmp
Step 3: Install Driver using below command
yum --nogpgcheck localinstall ClouderaImpalaODBC-2.6.0.1000-1.x86_64.rpm
Verify, if the package is installed on the server using
rpm -qa | grep ClouderaImpalaODBC
Step 4: Create the directory if not exists
mkdir -p /opt/odbc_path/client/ODBC_64
Step 5: Go to the directory & create two odbc files (odbc.ini & obcinst.ini) inside it
cd /opt/odbc_path/client/ODBC_64
Create a odbc.ini file with below contents & provide your Host Ip & Port No
[ODBC Data Sources] TESTDSN=Cloudera ODBC Driver for Impala 64-bit [TESTDSN] Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so #The DriverUnicodeEncoding setting is only used for SimbaDM #When set to 1, SimbaDM runs in UTF-16 mode. #When set to 2, SimbaDM runs in UTF-8 mode. #DriverUnicodeEncoding=2 #Values for HOST, PORT, KrbFQDN, and KrbServiceName should be set here. #They can also be specified on the connection string. HOST=<Host IP> PORT=<Port No> Database=default #The authentication mechanism. #0 - No authentication (NOSASL) #1 - Kerberos authentication (SASL) #2 - Username authentication (SASL) #3 - Username/password authentication (NOSASL or SASL depending on UseSASL configuration) AuthMech=3 #Set to 1 to use SASL for authentication. #Set to 0 to not use SASL. #When using Kerberos authentication (SASL) or Username authentication (SASL) SASL is always used and this configuration is ignored. SASL is always not used for No authentication (NOSASL). UseSASL=1 #Kerberos related settings. KrbFQDN=_HOST KrbRealm= KrbServiceName=impala #Username/password authentication with SASL settings. UID= PWD= #Set to 0 to disable SSL. #Set to 1 to enable SSL. SSL=1 CAIssuedCertNamesMismatch=1 TrustedCerts=/opt/cloudera/impalaodbc/lib/64/cacerts.pem #General settings TSaslTransportBufSize=1000 RowsFetchedPerBlock=10000 SocketTimeout=0 StringColumnLength=32767 UseNativeQuery=0
Create a odbcinst.ini file with below contents
[ODBC Drivers] Cloudera ODBC Driver for Impala 64-bit=Installed [Cloudera ODBC Driver for Impala 64-bit] Description=Cloudera ODBC Driver for Impala (64-bit) Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so #The option below is for using unixODBC when compiled with -DSQL_WCHART_CONVERT. #Execute 'odbc_config --cflags' to determine if you need to uncomment it. #IconvEncoding=UCS-4LE
Step 6: Set the environment variables
export ODBCINI=/opt/odbc_path/client/ODBC_64/odbc.ini export ODBCINSTINI=/opt/odbc_path/client/ODBC_64/odbcinst.ini export CLOUDERAIMPALAODBCINI=/opt/cloudera/impalaodbc/lib/64/cloudera.impalaodbc.ini
You could also update the same in your /etc/profile file to avoid each time environment variables set up
Now, you are all set to test the connectivity using pyodbc or impayla python packages to connect to Impala. Below is an example to connect to Impala
import pyodbc import csv from datetime import datetime pyodbc.pooling = False Configuration settings for the ODBC connection try: pyodbc.autocommit = True cfg = {'DSN': 'TESTDSN', 'host': 'hostname', 'port': portno, 'username': 'username', 'password': 'password','SSL':True,'AllowSelfSignedServerCert':True} conString = '''DSN={0};Host={1};Port={2};UID={3};PWD={4};SSL={5};AllowSelfSignedServerCert={6}'''.format(cfg['DSN'], cfg['host'], cfg['port'], cfg['username'], cfg['password'], cfg['SSL'], cfg['AllowSelfSignedServerCert']) pyodbc.autocommit = True# Create connection
conEDL = pyodbc.connect(conString, autocommit=True)
print(conEDL) cursor = conEDL.cursor()
print(cursor) print('connection established successfully')
except Exception as err: print('Error is :',err) print('Failed to establish connection')
I hope, this blog helps to install Impala on AWS platform. Please like and comments if you have any query related to this blog.