With Amazon Elastic MapReduce (Amazon EMR) you can analyze and process vast amounts of data. It does this by distributing the computational work across a cluster of virtual servers running in the Amazon cloud. The cluster is managed using an open-source framework called Hadoop.

Assuming you already have an AWS account, lets start creating an EMR Cluster to analyse a web server log file. Amazon EMR use Amazon S3 to store input data, log files, and output data. Therefore, before we create an EMR cluster, we need to create an S3 storage for the inputs and outputs.


Note: Bucket name must be unique across the entire AWS S3 bucket domain, and make sure the Region of the S3 must be the same as your EC region to avoid unnecessary charges.

Create an folder for “logs” and “output” respectively.
therefore, you will have 2 S3 folder with respective path as below: s3://emrbucketname/output
s3://emrbucketname/logs

Change the blue font to your bucket name.

Once you have the above ready, you may create an EMR cluster as below:

 Set your cluster name, 3 important notes:
a) log folder S3 location must be the same as what you have created above – s3://emrbucketname/logs
b)Install Hive, Pig, Hue and Spark for you to play around later.
c) Recommended to set 1 master large instance and 2 Core Large instances, set 0 for task instance for this scenario.

 To save cost, you may also try to request for spot instance by setting a bidding price. Spot instance will be at your service at much lower price, only when you are the higher bidder at the particular time.

 You will see the below after the cluster is successfully created.

From  here onward, you may run a Pig/Hive Script and query your data using Hue etc. Play around with all the applications you have installed.

Warning: Please terminate your cluster once you finished playing around. Never let your cluster idle, because the charges will be incurring despite cluster server is idle. I was diverted from the attention of works and let the server idle for 18 hours and realize that it incurred a charges of USD 22++.


Related Posts

3 thoughts on “Setting up AWS Elastic Map Reduce

Comments are closed.