With Amazon Elastic MapReduce (Amazon EMR) you can analyze and process vast amounts of data. It does this by distributing the computational work across a cluster of virtual servers running in the Amazon cloud. The cluster is managed using an open-source framework called Hadoop.
Assuming you already have an AWS account, lets start creating an EMR Cluster to analyse a web server log file. Amazon EMR use Amazon S3 to store input data, log files, and output data. Therefore, before we create an EMR cluster, we need to create an S3 storage for the inputs and outputs.
Note: Bucket name must be unique across the entire AWS S3 bucket domain, and make sure the Region of the S3 must be the same as your EC region to avoid unnecessary charges.
Create an folder for “logs” and “output” respectively.therefore, you will have 2 S3 folder with respective path as below: s3://emrbucketname/output
s3://emrbucketname/logs
Change the blue font to your bucket name.
Once you have the above ready, you may create an EMR cluster as below:
Set your cluster name, 3 important notes:
a) log folder S3 location must be the same as what you have created above – s3://emrbucketname/logs
b)Install Hive, Pig, Hue and Spark for you to play around later.
c) Recommended to set 1 master large instance and 2 Core Large instances, set 0 for task instance for this scenario.
To save cost, you may also try to request for spot instance by setting a bidding price. Spot instance will be at your service at much lower price, only when you are the higher bidder at the particular time.
You will see the below after the cluster is successfully created.
From here onward, you may run a Pig/Hive Script and query your data using Hue etc. Play around with all the applications you have installed.
Warning: Please terminate your cluster once you finished playing around. Never let your cluster idle, because the charges will be incurring despite cluster server is idle. I was diverted from the attention of works and let the server idle for 18 hours and realize that it incurred a charges of USD 22++.
It is really a great work and the way in which you are sharing the knowledge is excellent. aws training in omr | aws training in velachery | best aws training center in chennai
The blog is so interactive and Informative , you should write more blogs like this Big Data Hadoop Online course Bangalore
Nice work, your blog is concept oriented ,kindly share more blogs like this AWS Online Course