Map Reduce Service MRS Demo Environment Setup with Terraform Huawei Cloud Provider

Contents

Demo Environment Preparation    2

Installation of Chocolatey (Package Manager for Window) on your local machine    2

Installation of Terraform with Huawei Cloud Provider (Window OS) using Chocolatey    2

Configure Terraform for Map Reduce Service (MRS) Demo Environment    3

1.    Creating terraform workspaces    3

2.    Configure Terraform version, env    3

3.    Initiate the terraform    4

Terraform deployment of Map Reduce Services (MRS) custom Demo cluster with sample data and Source Code/Scripts    5

1.    Create a new terraform script for MRS Cluster Deployment    5

2.    Run Terraform script & deploy the MRS Demo Cluster    5

3.    Ensure the cluster running with proper configuration    6

Configure the MRS demo Env    6

1.    Add User to IAM MRS Group    6

2.    IAM User Sync with MRS    7

3.    Manually add MRS services    7

4.    Sync NTP time    9

5.    Download the MRS client    11

Setup the Demo    15

Types of Demo available in the Demo ECS:    15

1.    Business Intelligence (BI) demo    15

2.    Spark Structured Streaming data with Kafka    23

3.    Integrating Apache Camel Kafka Connectors (For connection to cdc AWS S3, Azure Blob Storage, Google Pub/Sub, BigQuery etc)    27

4.    Other MRS example source code    28

5.    ModelArts examples ( In progress)    29

Terminating the Demo Environment and all relevant resources    29

 

 

 

Demo Environment Preparation

Installation of Chocolatey (Package Manager for Window) on your local machine

Note:This Demo Cluster will cost about USD 7 per hour. Do remember to terminate the cluster if not in use.

# Install Chocolatey – Chocolatey installation – Reference https://chocolatey.org/install#individual

  1. Open Powershell in “administrator” mode
  2. Ensure ExecutionPolicy is “Bypass”
  • Get-ExecutionPolicy

  • Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString(‘https://community.chocolatey.org/install.ps1’))

# Check if chocolatey installed properly

  • Choco -?

Installation of Terraform with Huawei Cloud Provider (Window OS) using Chocolatey

Install terraform with chocolatey

  • Choco install terraform
  1. Test if terraform installed properly
  • Terraform -version

 

Configure Terraform for Map Reduce Service (MRS) Demo Environment

  1. Creating terraform workspaces

    # Huawei Cloud MRS Demo workspace

  • Terraform workspace new analyticCluster_env.tf
  1. Configure Terraform version, env

    # Use editor to write (In this case, use notepad++, make sure notepad++ is installed if you follow the exact code below)

  • Start notepad++ versions.tf

# versions.tf

terraform {

required_providers {

huaweicloud = {

source = “huaweicloud/huaweicloud”

version = “>= 1.41.1”

}

}

# MRSdemo-env.tf

provider “huaweicloud” {

region = “<Region>”

access_key = “<Access Key>”

secret_key = “<Secret Key>”

}

## refer to below to the available region

https://developer.huaweicloud.com/intl/en-us/endpoint

## refer to below to get your access key and secret keys

https://support.huaweicloud.com/intl/en-us/iam_faq/iam_01_0618.html

# VPCtest.tf

resource “huaweicloud_vpc” “example” {

name = “my_vpc”

cidr = “192.168.0.0/16”

}

## Refer to below for latest Terraform Huawei Cloud Provider documentation

https://registry.terraform.io/providers/huaweicloud/huaweicloud/latest/docs

 

# Create 3 tf files in the folder

  1. Initiate the terraform

Note: Make sure the below are executed in Non-VPN connection

  • Terraform init

Terraform deployment of Map Reduce Services (MRS) custom Demo cluster with sample data and Source Code/Scripts

  1. Create a new terraform script for MRS Cluster Deployment

    # rename the VPCtest.tf as MRSDemo.tf and put in any of the below scripts

    Note 1: currently MRS 1.9.2 is supported in Terraform. However, if the component you need is in MRS 3.1 or LTS version, you may specific the 3.1.2LTS-LTS.3 version, and manually add the components (e.g. Hetu Engine)

    Note: If you just want to do simple MRS demo with version 1.9.2, just change the version in terraform script to 1.9.2. Otherwise, go ahead with the terraform script below with MRS 3.1.2LTS-LTS.3, which require 1 extra step to add MRS service components.

    Note 2: Please ensure your VPN is turn off during the entire process of running Terraform scripts. Huawei VPN is blocking Huawei Cloud and it will cause unexpected error.

    MRS Custom Cluster Demo Env

    This demo will take about 45 Minutes to prepare (20 Mins for Terraform to spin up the MRS cluster + 25 Mins to configure the relevant MRS settings and download the MRS client)

  2. Run Terraform script & deploy the MRS Demo Cluster

 

  • Terraform plan -out=”MRSdemo.txt”
  • Terraform apply

 

  1. Ensure the cluster running with proper configuration

     

    Check in Console

 

Configure the MRS demo Env

  1. Add User to IAM MRS Group

https://support.huaweicloud.com/intl/en-us/usermanual-mrs/mrs_01_0453.html

 

  1. IAM User Sync with MRS

>> Go to MRS >> [Cluster Name] >> Dashboard >> IAM User Sync

 

Ensure IAM User are Synchronized and relevant services are running

 

  1. Manually add MRS services

Add services manually as terraform does not fully support MRS version 3.1.2LTS yet. Go to “Components” è click on “Add Service”

Add the required services for your demo and adjust the Topology accordingly.

 

Note: This will take at least 10 to 15 minutes to rolling restart of all the services.

 

 

 

 

  1. Sync NTP time

    1. Add the security group of MRS cluster to the ECS named “ecs-jenkins” in console, make sure both has at least 1 security group is the same.

       

    2. Modify the ntp.conf to match the MRS master 1 & master 2 ip address
  • Vi /etc/ntp.conf

# Press “i” and modify the ip address inside to match to your master 1 & master 2 ip address

# Press “Esc” and “:wq!” to save the file

  • Service ntpd stop
  • /usr/sbin/ntpdate <master 1 ip address>

     

# Check if the ntp time is in sync with the MRS cluster

  • Ntpstat

 

  1. Download the MRS client

 

  1. Login to FusionInsight Access Manager

 

Note: FusionInsight Manager user name is default as “admin” and password will be “Adm1n@dm1n” as pre-defined in the terraform script parameter – manager_admin_pass

 

  1. Download the Cluster client files

Click on the “…” beside the Custer name è download client or refer to below link

https://support.huaweicloud.com/intl/en-us/usermanual-mrs/admin_guide_000014.html

 

 

# Once client download completed, it will be available in your master 1 node, in this case 192.168.0.61, you may check the details from the “Nodes” tab

 

  1. Remote login to the Master 1 Node’s /tmp/FusionInsight-Client” folder and copy the “FusionInsight-Clusterd_1_Services_Client.tar file to ECS we created as the cluster client execution environment

 

Note: Default username “root” and password “Adm1n@dm1n” as per defined in the terraform script parameter – node_admin_pass

  • Cd /tmp/FusionInsight-Client/

# copy the private ip address of the cluster client “ecs-jenkins” and sftp to the ip

  • Sftp <ecs-jenkins ip address>
  • Cd /opt

# Copy the MRS client file to ECS “ecs-jenkins” /opt folder

  • Put FusionInsight_Cluster_1_Services_Client.tar
  • Ssh <ecs-jenkins ip address>
  • Cd /opt

# unzip the MRS client tar files into the /opt folder of the Cluster client ECS named (ecs-jenkins)

  • tar -xvf FusionInsight_Cluster_1_Services_Client.tar

# checksum the client files

  • sha256sum -c FusionInsight_Cluster_1_Services_ClientConfig.tar.sha256


# unzip the tar file – FusionInsight_Cluster_1_Services_ClientConfig.tar

  • tar -xvf FusionInsight_Cluster_1_Services_ClientConfig.tar

 

  • cd /tmp/FusionInsight-Client/FusionInsight_Cluster_1_Services_ClientConfig

     

# Install the client in the desired folder, e.g. /opt/client/

  • ./install.sh /opt/client
  • Cd /opt/client

# source bigdata_env configuration and test query if hdfs works

  • Source bigdata_env
  • Hdfs dfs -ls /

The client setup is completed.

Setup the Demo

Types of Demo available in the Demo ECS:

Note: (ecs-jenkins’s /root folder)

  1. Business Intelligence (BI) demo

Hive Demo

https://support.huaweicloud.com/intl/en-us/cmpntguide-lts-mrs/mrs_01_0442.html

Hetu Engine Demo

https://support.huaweicloud.com/intl/en-us/cmpntguide-lts-mrs/mrs_01_1711.html

Apache Hudi Demo

https://support.huaweicloud.com/intl/en-us/cmpntguide-lts-mrs/mrs_01_24103.html

Decoupling of Storage & Computing in Hive

MRS allows you to store data in OBS and use an MRS cluster for data computing only. In this way, storage and compute are decoupled. You can use the IAM service to perform simple configurations to access OBS.

  1. Create an Agency in IAM and assign it to MRS Cluster and ecs-jenkins

 

Select OBS OperateAccess

Select Global Services

MRS è Agency è Manage Agency and add the above agency we just created

ECS è Ecs-jenkins è Management Information è Agency and assign the above agency we just created.

OBS è Create OBS File System

Now in ecs-jenkins, try the hdfs dfs query on the OBS bucket with Demo OpenData

  • Cd /opt/client/
  • Source bigdata_env
  • Hdfs dfs -ls obs://sg-demo-poc/

 

  • Beeline # login to Hive command prompt

# Create a Hive table:

Create External table hdb_transactions(

town string,

flat_type string,

block string,

street_name string,

full_road_name string,

lat string,

long string,

storey_range string,

floor_area_sqm string,

flat_model string,

lease_commence_date string,

remaining_lease string,

resale_price string)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’

Stored as textfile;

 

# Load data into the table using the sample data from OBS

load data inpath ‘obs://sg-demo-poc/HDB_openData/sample-data-with-long-and-lat.csv’ into table hdb_transactions;

  • Select * from hdb_transactions limit 3;
  • Select count(*) from hdb_transactions;

 

Other sample data in the OBS:

 

 

  1. Login FusionInsight è Cluster è Services è HetuEngine

  1. Create Default Configuration and start the HetuEngine

 

  1. Creating a Hetu_admin administrator

    Go to FusionInsight Access Manager è System è Permission

     

  2. Creating a Hetu_test user account similar to above (user group = hetuuser, hive) (role = default)

  1. Login to ecs-jenkins instance and login to Hetu Engine
  • Cd /opt/client
  • Source bigdata_env

  • hetu-cli –catalog hive –tenant default –schema default –user hetu_test

 

  • show tables;
  • select count(*) from hdb_transactions

# You will be able to see the same record count in your hive table

 

Suggestion: Try to use Power BI Desktop to build a Map Visualization

https://support.huaweicloud.com/intl/en-us/cmpntguide-lts-mrs/mrs_01_24012.html

  1. Download and install the ODBC driver from https://download.openlookeng.io/
  • Cd C:\Program Files\openLooKeng\openLooKeng ODBC Driver 64-bit\odbc_gateway\mycat\bin>
  • .\mycat.bat stop

Copy the hetu-jdbc-x.x.x-hw-ei-xxxx.jar file to C:\Program Files\openLooKeng\openLooKeng ODBC Driver 64-bit\odbc_gateway\mycat\lib\

Modify Server.xml prefix from lk to presto

Create a file with the hetu engine username and password

Start the mycat.bat service

 

  1. Configure the openLookeng data source

jdbc:presto://119.13.103.216:29861,159.138.84.54:29861,114.119.172.41:29861/hive/default?serviceDiscoveryMode=hsbroker

 

 

# Connect Hive/Impala/Hetu Engine with Power BI Desktop

 

# Build a globe map with Longtitude and Latitude of the sample data

 

  1. Spark Structured Streaming data with Kafka

Kafka Demo

https://support.huaweicloud.com/intl/en-us/cmpntguide-lts-mrs/mrs_01_1031.html

  1. Note all the Kafka Broker and ZooKeeper instances

Kafka Brokers: (in this case)

192.168.0.88

192.168.0.174

192.168.0.129

Port No: 9092 / 21007 (With Kerberos Security)

 

ZooKeeper: (in this case)

192.168.0.44

192.168.0.222

192.168.0.198

Port No: 2181

 

  1. Create a topic in Kafka

    #login to ecs-jenkins ECS client (username: root, password: Ke0ngh@n)

  • Cd /opt/client
  • Source bigdata_env
  • kafka-topics.sh –create –zookeeper <zookeeper ip>:2181/kafka –partitions 2 –replication-factor 2 –topic DemoPOC

  1. Simulate a streaming data writing to the topic using CDM
    1. Buy a new Cloud Data Migration Server (CDM) – please ensure the CDM is in the same VPC/subnet/security group with MRS cluster

    2. Create link for OBS

    3. Create a link for MRS-Kafka

    4. Simulate a streaming data from csv file using CDM

       

       

       

       

       

       

      Note: 95,522 records written to Kafka Topic “DemoPOC”

       

    5. Run spark to read data from Kafka Topic” DemoPOC” and write to OBS bucket
    1. Setup the Huawei dependency JAR files environment
      1. Add the following mirror repository address to the project pom.xml file

    https://support.huaweicloud.com/intl/en-us/devg-lts-mrs/mrs_07_010002.html

    1. In this case, inside the /root/huaweicloud-mrs-example/src/spark-examples/

     

    1. Build the JAR file using maven
  • Mvn Clean Package

     

     

  1.  

    spark-submit –master yarn –deploy-mode client –num-executors 3 –jars $(files= /opt/client/Spark2x/spark/jars/*.jar); IFS=,; echo “${files[*]}”) –class com.huawei.bigdata.spark.examples.KafkaToOBS /root/huaweicloud-mrs-example/src/spark-examples/sparknormal-examples/SparkStructuredStreamingScalaExample/target/SparkStreamingKafka010Example-1.0.jar hdfs://hacluster/tmp/ 192.168.0.16:9092 DemoPOC 5 group obs://sg-demo-poc/KafkaToOBS/

     

    spark-submit –master yarn –deploy-mode client –num-executors 3 –class com.huawei.bigdata.spark.examples.KafkaToOBS –jars $(files= /opt/client/Spark2x/spark/jars/*.jar),SparkStreamingKafka010Example-1.0.jar hdfs://hacluster/tmp/ 192.168.0.16:9092 DemoPOC 10 groupaaa obs://sg-demo-poc/KafkaToOBS/

 

  1. Integrating Apache Camel Kafka Connectors (For connection to cdc AWS S3, Azure Blob Storage, Google Pub/Sub, BigQuery etc)

# Test if the normal kafka connect is working fine

  • connect-standalone.sh $KAFKA_HOME/config/connect-standalone.properties $KAFKA_HOME/config/connect-file-sink.properties

add in consumer.auto.offset.reset=earliest to connect-standalone.properties
				

 

# Camel S3 Source Connector

connect-standalone.sh $KAFKA_HOME/config/connect-standalone.properties $KAFKA_HOME/config//opt/client/Kafka/kafka/config/camel-properties/docs/examples/CamelAwss3sourceSourceConnector.properties

 

 

# the kafka connect file sink should write our hdb transactions data into the local file – test.sink.txt

 

# Count the number of lines in the file should be equal to 95522 minus the header

  • wc -l test.sink.txt

 

  • Mvn clean package # to build the jar files

    Put the jar file into kafka jars folder

    Configure the camel-xx.properties cd

     

  • $KAFKA_HOME/bin/connect-standalone.sh $KAFKA_HOME/config/connect-standalone.properties $KAFKA_HOME/camel-connectors/camel-kafka-connector/examples/CamelFileSinkConnector.properties

 

  1. Other MRS example source code

In the ecs-jenkins server, there is Huawei Cloud MRS example code include HBase, HDFS, Hive, Kafka, Mapreduce, HetuEngine, Spark, Elasticsearch, ClickHouse etc. You can get started in minutes using Maven, and maven has been installed in the server.

 

Example codes below might be including Java, Scala or python languages. Just pick one of each to build your demo projects.

 

 

  1. ModelArts examples ( In progress)

    ExeML

 

Official ModelArts Examples

Terminating the Demo Environment and all relevant resources

After demo completed, remember to terminate the cluster

  • Terraform destroy

 

Related Posts