Contents
Demo Environment Preparation 2
Installation of Chocolatey (Package Manager for Window) on your local machine 2
Installation of Terraform with Huawei Cloud Provider (Window OS) using Chocolatey 2
Configure Terraform for Map Reduce Service (MRS) Demo Environment 3
1. Creating terraform workspaces 3
2. Configure Terraform version, env 3
1. Create a new terraform script for MRS Cluster Deployment 5
2. Run Terraform script & deploy the MRS Demo Cluster 5
3. Ensure the cluster running with proper configuration 6
1. Add User to IAM MRS Group 6
3. Manually add MRS services 7
Types of Demo available in the Demo ECS: 15
1. Business Intelligence (BI) demo 15
2. Spark Structured Streaming data with Kafka 23
4. Other MRS example source code 28
5. ModelArts examples ( In progress) 29
Terminating the Demo Environment and all relevant resources 29
Demo Environment Preparation
Installation of Chocolatey (Package Manager for Window) on your local machine
Note:This Demo Cluster will cost about USD 7 per hour. Do remember to terminate the cluster if not in use.
# Install Chocolatey – Chocolatey installation – Reference https://chocolatey.org/install#individual
- Open Powershell in “administrator” mode
- Ensure ExecutionPolicy is “Bypass”
- Get-ExecutionPolicy
- Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString(‘https://community.chocolatey.org/install.ps1’))
# Check if chocolatey installed properly
-
Choco -?
Installation of Terraform with Huawei Cloud Provider (Window OS) using Chocolatey
Install terraform with chocolatey
- Choco install terraform
- Test if terraform installed properly
- Terraform -version
Configure Terraform for Map Reduce Service (MRS) Demo Environment
-
Creating terraform workspaces
# Huawei Cloud MRS Demo workspace
- Terraform workspace new analyticCluster_env.tf
-
Configure Terraform version, env
# Use editor to write (In this case, use notepad++, make sure notepad++ is installed if you follow the exact code below)
- Start notepad++ versions.tf
# versions.tf
terraform {
required_providers {
huaweicloud = {
source = “huaweicloud/huaweicloud”
version = “>= 1.41.1”
}
}
# MRSdemo-env.tf
provider “huaweicloud” {
region = “<Region>”
access_key = “<Access Key>”
secret_key = “<Secret Key>”
}
## refer to below to the available region
https://developer.huaweicloud.com/intl/en-us/endpoint
## refer to below to get your access key and secret keys
https://support.huaweicloud.com/intl/en-us/iam_faq/iam_01_0618.html
# VPCtest.tf
resource “huaweicloud_vpc” “example” {
name = “my_vpc”
cidr = “192.168.0.0/16”
}
## Refer to below for latest Terraform Huawei Cloud Provider documentation
https://registry.terraform.io/providers/huaweicloud/huaweicloud/latest/docs
# Create 3 tf files in the folder
-
Initiate the terraform
Note: Make sure the below are executed in Non-VPN connection
- Terraform init
Terraform deployment of Map Reduce Services (MRS) custom Demo cluster with sample data and Source Code/Scripts
-
Create a new terraform script for MRS Cluster Deployment
# rename the VPCtest.tf as MRSDemo.tf and put in any of the below scripts
Note 1: currently MRS 1.9.2 is supported in Terraform. However, if the component you need is in MRS 3.1 or LTS version, you may specific the 3.1.2LTS-LTS.3 version, and manually add the components (e.g. Hetu Engine)
Note: If you just want to do simple MRS demo with version 1.9.2, just change the version in terraform script to 1.9.2. Otherwise, go ahead with the terraform script below with MRS 3.1.2LTS-LTS.3, which require 1 extra step to add MRS service components.
Note 2: Please ensure your VPN is turn off during the entire process of running Terraform scripts. Huawei VPN is blocking Huawei Cloud and it will cause unexpected error.
MRS Custom Cluster Demo Env
This demo will take about 45 Minutes to prepare (20 Mins for Terraform to spin up the MRS cluster + 25 Mins to configure the relevant MRS settings and download the MRS client)
-
Run Terraform script & deploy the MRS Demo Cluster
- Terraform plan -out=”MRSdemo.txt”
- Terraform apply
-
Ensure the cluster running with proper configuration
Check in Console
Configure the MRS demo Env
-
Add User to IAM MRS Group
https://support.huaweicloud.com/intl/en-us/usermanual-mrs/mrs_01_0453.html
-
IAM User Sync with MRS
>> Go to MRS >> [Cluster Name] >> Dashboard >> IAM User Sync
Ensure IAM User are Synchronized and relevant services are running
-
Manually add MRS services
Add services manually as terraform does not fully support MRS version 3.1.2LTS yet. Go to “Components” è click on “Add Service”
Add the required services for your demo and adjust the Topology accordingly.
Note: This will take at least 10 to 15 minutes to rolling restart of all the services.
-
Sync NTP time
-
Add the security group of MRS cluster to the ECS named “ecs-jenkins” in console, make sure both has at least 1 security group is the same.
- Modify the ntp.conf to match the MRS master 1 & master 2 ip address
-
- Vi /etc/ntp.conf
# Press “i” and modify the ip address inside to match to your master 1 & master 2 ip address
# Press “Esc” and “:wq!” to save the file
- Service ntpd stop
-
/usr/sbin/ntpdate <master 1 ip address>
# Check if the ntp time is in sync with the MRS cluster
- Ntpstat
-
Download the MRS client
- Login to FusionInsight Access Manager
Note: FusionInsight Manager user name is default as “admin” and password will be “Adm1n@dm1n” as pre-defined in the terraform script parameter – manager_admin_pass
- Download the Cluster client files
Click on the “…” beside the Custer name è download client or refer to below link
https://support.huaweicloud.com/intl/en-us/usermanual-mrs/admin_guide_000014.html
# Once client download completed, it will be available in your master 1 node, in this case 192.168.0.61, you may check the details from the “Nodes” tab
- Remote login to the Master 1 Node’s /tmp/FusionInsight-Client” folder and copy the “FusionInsight-Clusterd_1_Services_Client.tar file to ECS we created as the cluster client execution environment
Note: Default username “root” and password “Adm1n@dm1n” as per defined in the terraform script parameter – node_admin_pass
- Cd /tmp/FusionInsight-Client/
# copy the private ip address of the cluster client “ecs-jenkins” and sftp to the ip
- Sftp <ecs-jenkins ip address>
- Cd /opt
# Copy the MRS client file to ECS “ecs-jenkins” /opt folder
- Put FusionInsight_Cluster_1_Services_Client.tar
- Ssh <ecs-jenkins ip address>
- Cd /opt
# unzip the MRS client tar files into the /opt folder of the Cluster client ECS named (ecs-jenkins)
- tar -xvf FusionInsight_Cluster_1_Services_Client.tar
# checksum the client files
- sha256sum -c FusionInsight_Cluster_1_Services_ClientConfig.tar.sha256
# unzip the tar file – FusionInsight_Cluster_1_Services_ClientConfig.tar
- tar -xvf FusionInsight_Cluster_1_Services_ClientConfig.tar
-
cd /tmp/FusionInsight-Client/FusionInsight_Cluster_1_Services_ClientConfig
# Install the client in the desired folder, e.g. /opt/client/
- ./install.sh /opt/client
- Cd /opt/client
# source bigdata_env configuration and test query if hdfs works
- Source bigdata_env
- Hdfs dfs -ls /
The client setup is completed.
Setup the Demo
Types of Demo available in the Demo ECS:
Note: (ecs-jenkins’s /root folder)
-
Business Intelligence (BI) demo
Hive Demo
https://support.huaweicloud.com/intl/en-us/cmpntguide-lts-mrs/mrs_01_0442.html
Hetu Engine Demo
https://support.huaweicloud.com/intl/en-us/cmpntguide-lts-mrs/mrs_01_1711.html
Apache Hudi Demo
https://support.huaweicloud.com/intl/en-us/cmpntguide-lts-mrs/mrs_01_24103.html
Decoupling of Storage & Computing in Hive
MRS allows you to store data in OBS and use an MRS cluster for data computing only. In this way, storage and compute are decoupled. You can use the IAM service to perform simple configurations to access OBS.
-
Create an Agency in IAM and assign it to MRS Cluster and ecs-jenkins
Select OBS OperateAccess
Select Global Services
MRS è Agency è Manage Agency and add the above agency we just created
ECS è Ecs-jenkins è Management Information è Agency and assign the above agency we just created.
OBS è Create OBS File System
Now in ecs-jenkins, try the hdfs dfs query on the OBS bucket with Demo OpenData
- Cd /opt/client/
- Source bigdata_env
- Hdfs dfs -ls obs://sg-demo-poc/
- Beeline # login to Hive command prompt
# Create a Hive table:
Create External table hdb_transactions(
town string,
flat_type string,
block string,
street_name string,
full_road_name string,
lat string,
long string,
storey_range string,
floor_area_sqm string,
flat_model string,
lease_commence_date string,
remaining_lease string,
resale_price string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
Stored as textfile;
# Load data into the table using the sample data from OBS
load data inpath ‘obs://sg-demo-poc/HDB_openData/sample-data-with-long-and-lat.csv’ into table hdb_transactions;
- Select * from hdb_transactions limit 3;
- Select count(*) from hdb_transactions;
Other sample data in the OBS:
- Login FusionInsight è Cluster è Services è HetuEngine
- Create Default Configuration and start the HetuEngine
-
Creating a Hetu_admin administrator
Go to FusionInsight Access Manager è System è Permission
- Creating a Hetu_test user account similar to above (user group = hetuuser, hive) (role = default)
- Login to ecs-jenkins instance and login to Hetu Engine
- Cd /opt/client
- Source bigdata_env
- hetu-cli –catalog hive –tenant default –schema default –user hetu_test
- show tables;
- select count(*) from hdb_transactions
# You will be able to see the same record count in your hive table
Suggestion: Try to use Power BI Desktop to build a Map Visualization
https://support.huaweicloud.com/intl/en-us/cmpntguide-lts-mrs/mrs_01_24012.html
- Download and install the ODBC driver from https://download.openlookeng.io/
- Cd C:\Program Files\openLooKeng\openLooKeng ODBC Driver 64-bit\odbc_gateway\mycat\bin>
- .\mycat.bat stop
Copy the hetu-jdbc-x.x.x-hw-ei-xxxx.jar file to C:\Program Files\openLooKeng\openLooKeng ODBC Driver 64-bit\odbc_gateway\mycat\lib\
Modify Server.xml prefix from lk to presto
Create a file with the hetu engine username and password
Start the mycat.bat service
- Configure the openLookeng data source
jdbc:presto://119.13.103.216:29861,159.138.84.54:29861,114.119.172.41:29861/hive/default?serviceDiscoveryMode=hsbroker
# Connect Hive/Impala/Hetu Engine with Power BI Desktop
# Build a globe map with Longtitude and Latitude of the sample data
-
Spark Structured Streaming data with Kafka
Kafka Demo
https://support.huaweicloud.com/intl/en-us/cmpntguide-lts-mrs/mrs_01_1031.html
- Note all the Kafka Broker and ZooKeeper instances
Kafka Brokers: (in this case)
192.168.0.88
192.168.0.174
192.168.0.129
Port No: 9092 / 21007 (With Kerberos Security)
ZooKeeper: (in this case)
192.168.0.44
192.168.0.222
192.168.0.198
Port No: 2181
-
Create a topic in Kafka
#login to ecs-jenkins ECS client (username: root, password: Ke0ngh@n)
- Cd /opt/client
- Source bigdata_env
- kafka-topics.sh –create –zookeeper <zookeeper ip>:2181/kafka –partitions 2 –replication-factor 2 –topic DemoPOC
-
Simulate a streaming data writing to the topic using CDM
-
Buy a new Cloud Data Migration Server (CDM) – please ensure the CDM is in the same VPC/subnet/security group with MRS cluster
-
Create link for OBS
-
Create a link for MRS-Kafka
-
Simulate a streaming data from csv file using CDM
Note: 95,522 records written to Kafka Topic “DemoPOC”
- Run spark to read data from Kafka Topic” DemoPOC” and write to OBS bucket
-
Setup the Huawei dependency JAR files environment
- Add the following mirror repository address to the project pom.xml file
https://support.huaweicloud.com/intl/en-us/devg-lts-mrs/mrs_07_010002.html
- In this case, inside the /root/huaweicloud-mrs-example/src/spark-examples/
- Build the JAR file using maven
-
-
Mvn Clean Package
-
spark-submit –master yarn –deploy-mode client –num-executors 3 –jars $(files= /opt/client/Spark2x/spark/jars/*.jar); IFS=,; echo “${files[*]}”) –class com.huawei.bigdata.spark.examples.KafkaToOBS /root/huaweicloud-mrs-example/src/spark-examples/sparknormal-examples/SparkStructuredStreamingScalaExample/target/SparkStreamingKafka010Example-1.0.jar hdfs://hacluster/tmp/ 192.168.0.16:9092 DemoPOC 5 group obs://sg-demo-poc/KafkaToOBS/
spark-submit –master yarn –deploy-mode client –num-executors 3 –class com.huawei.bigdata.spark.examples.KafkaToOBS –jars $(files= /opt/client/Spark2x/spark/jars/*.jar),SparkStreamingKafka010Example-1.0.jar hdfs://hacluster/tmp/ 192.168.0.16:9092 DemoPOC 10 groupaaa obs://sg-demo-poc/KafkaToOBS/
-
Integrating Apache Camel Kafka Connectors (For connection to cdc AWS S3, Azure Blob Storage, Google Pub/Sub, BigQuery etc)
# Test if the normal kafka connect is working fine
- connect-standalone.sh $KAFKA_HOME/config/connect-standalone.properties $KAFKA_HOME/config/connect-file-sink.properties
add in consumer.auto.offset.reset=earliest to connect-standalone.properties
# Camel S3 Source Connector
connect-standalone.sh $KAFKA_HOME/config/connect-standalone.properties $KAFKA_HOME/config//opt/client/Kafka/kafka/config/camel-properties/docs/examples/CamelAwss3sourceSourceConnector.properties
# the kafka connect file sink should write our hdb transactions data into the local file – test.sink.txt
# Count the number of lines in the file should be equal to 95522 minus the header
- wc -l test.sink.txt
-
Mvn clean package # to build the jar files
Put the jar file into kafka jars folder
Configure the camel-xx.properties cd
- $KAFKA_HOME/bin/connect-standalone.sh $KAFKA_HOME/config/connect-standalone.properties $KAFKA_HOME/camel-connectors/camel-kafka-connector/examples/CamelFileSinkConnector.properties
-
Other MRS example source code
In the ecs-jenkins server, there is Huawei Cloud MRS example code include HBase, HDFS, Hive, Kafka, Mapreduce, HetuEngine, Spark, Elasticsearch, ClickHouse etc. You can get started in minutes using Maven, and maven has been installed in the server.
Example codes below might be including Java, Scala or python languages. Just pick one of each to build your demo projects.
-
ModelArts examples ( In progress)
ExeML
Official ModelArts Examples
Terminating the Demo Environment and all relevant resources
After demo completed, remember to terminate the cluster
- Terraform destroy