Sqoop data import and basic data validation in HDFS/Hive/Impala/Hue

  1. Sqoop1 and Sqoop2

a. Features difference between Sqoop1 and Sqoop2 (Sqoop2 is being deprecated, advised to use Sqoop1 instead)

b. Getting lists of available commands in Sqoop1 => sqoop help

# Checking sqoop version and access to relevant User Guide/Developer Guide/API Documentation

https://sqoop.apache.org/docs/1.4.6/index.html

# Connecting to a Database Server and explore the database

# Understand how to ingest data from RDBMS to HDFS

sqoop import-all-tables -m 1 –connect jdbc:mysql://quickstart:3306/retail_db –username=retail_dba –password=cloudera –compression-codec=snappy –as-parquetfile –warehouse-dir=/user/hive/warehouse –hive-import

# Check if the data import into HDFS

# Use Hive CLI to check the imported data

# Or use Impala to check the imported content

# Use Hue UI for data analysis
cloudera.quickstart:8888
Username/password: cloudera/cloudera

# Go to file browser to check if the data has been imported

# Go to impala query editor

# Use Impala Query Editor

# Most popular products

Related Posts