Old post – Setting up a fake apache server log generator and Amazon Kinesis Firehose delivery stream data pipeline

### This is an old post which previously I wrote while I was preparing for my AWS Certification Exam.

1. Setup a EC2 instance for fake apache server log generator

Connect using Putty  – host name = public DNS of the EC2 instance

Putty – SSH_Auth = key pair used for the EC2 instance

Log in as EC2 user

git clone the fake apache log generator from github

Generate 100 log lines into a .log file
$ cd Fake-Apache-Log-Generator
$ python apache-fake-log-gen.py -n 100 -o LOG
View the access Log file

vi access_log_xxxxxxxxxx


you should see a list of fake Apache logs like below:

2. Setup the Amazon Kinesis Firehose Agent

git clone https://github.com/awslabs/amazon-kinesis-agent.git
cd amazon-kinesis-agent
sudo ./setup --install
Create a destination S3 Bucket for the storage of the Apache Logs from Firehosee



All S3 setting leave as default

Note down the S3 Bucket URL – https://fake-logs-farmountain.s3.amazonaws.com/

replace fake-logs-farmountain with your own bucket name

This will be used as a destination of AWS Firehose Delivery Stream we will be setting up later on
3. Setup the AWS Firehose Delivery Stream
type in the delivery stream name
We will use the Kinesis Agent for the Apache Logs Transformation, so here we set transformation as disabled



Set the S3 Bucket we created previously as the Firehose Delivery Stream Destination



Everything else leave as default, but create a new IAM role to allow AWS Firehose to write to the S3 bucket
After the Status is ‘Active’, test with demo data
It will take quite a while before you can see the testing demo data in your S3 bucket.
4. Setup our Kinesis Agent to capture the fake Apache logs and deliver to Firehose
Go back to putty

cd /etc/aws-kinesis


sudo vi agent.json

Replace the agent.json file content with the below:

{  "cloudwatch.emitMetrics": true,
  "firehose.endpoint": "firehose.ap-southeast-1.amazonaws.com",

"flows": [
{
"filePattern": "/home/ec2-user/Fake-Apache-Log-Generator/access_log*",
"deliveryStream": "fake-logs",
"dataProcessingOptions": [
{
"optionName": "LOGTOJSON",
"logFormat": "COMMONAPACHELOG"
}
}
]
]
}
Save the file and quit vim by pressing ESC, type “:wq!”
5. Start the Kinesis Agent and trigger the fake log generator

sudo service aws-kinesis-agent start


Go back to the Fake-Apache-Log-Generator folder and Trigger the Infinite log file generation

$ python apache-fake-log-gen.py -n 0 -o LOG 
After a while, you should see the fake log files in the S3 bucket

Further check on the log file in S3, you will find that the fake Apache log file has been transformed to json format by the the kinesis agent





Related Posts

One thought on “Old post – Setting up a fake apache server log generator and Amazon Kinesis Firehose delivery stream data pipeline

Comments are closed.