In this article, we would demonstrate how to launch an EMR cluster in AWS. To know more about EMR, refer Understanding AWS EMR
What is the Prerequisites?
Before you start launching a new cluster, you need to make sure the following prerequisites are taken care.
- EC2 Key-Pair – This would require to connect to your cluster nodes using SSH connection.
- s3 folder – This is required by Amazon EMR to generate the log data for your cluster.
Launch an EMR Cluster
The following steps would help to launch an EMR cluster in AWS.
1. Sign in to AWS console (link) and search EMR service
2. Click on Create Cluster
3. Provide Create Cluster details
AWS EMR lets you to create cluster through both quick options which is the fastest way to create an cluster with defaults values except few parameters and advanced options which allows you to configure the values for your cluster.
The create cluster has following parts for both quick and advanced options. In this demonstration, we would follow the quick options here
- Cluster Name – Provide a name to your cluster
- Logging – This allows AWS EMR to write data logs for your cluster. Provide your s3 bucket folder path to store your cluster logs
- Launch Mode – This specifies if you need a long running cluster (Cluster) or a cluster that terminates upon some steps execution (Step executions)
- Release – This specifies the release version of your EMR cluster
- Applications – This option allows you to select the list of open-source big data applications to be installed on your cluster.
- Instance Type – This determines the EC2 instance type of the EC2 for your EMR cluster.
- Number of Instances – This determines the number of EC2 to be initialized for your cluster. At least one node (master node) is required to launch a cluster.
Here, we would proceed with m5.xlarge instance type with 1 node only.
Security and Access
- EC2 Key Pair – This requires for SSH to your EC2
- Permissions – AWS lets you to select either default or custom role for your cluster.
Provide your own EC2 key-pair and proceed for create cluster
4. Verify Cluster Status
Click on create cluster to your first EMR with 1 master node cluster in AWS. Once you initiated the create cluster, the cluster state would show you as Starting
This ideally takes few minutes to provision your EC2 for your EMR cluster.
Once the cluster status is changed to Running, the instance would be available for your data analysis and processing by your open-source software’s installed in your EMR cluster.
AWS EMR would generate output like below in your s3 location
Connect to your EMR Cluster
Now the Cluster which we created with 1 node is ready to use. We would connect to the EC2 using SSH via the same key pair which we used during cluster launch time.
1. Connect to your master node using SSH
If you face connection timeout issue, then make sure port 22 (SSH) is allowed for your IP in inbound security group of your master node security group
AWS EMR lets you to login with hadoop user which has root privilege on the server.
So now you are able to launch and connect to you EMR cluster in AWS.
Please comment below for any questions related to this blog.