Launch Spark Shell on Hortonwork Sandbox

#To launch Spark Shell on Hortonwork Sandbox, just type in below in your browser:
http://Host:4200/
#Login details:
username: root/maria_dev
password: hadoop/maria_dev
#The focus of learning is spark, but over here just to list some commonly used Hadoop commands:
$hadoop fs -help
$hadoop fs -ls
$hadoop fs -cat
$hadoop fs -get
$hadoop fs -put
$hadoop fs -mkdir
$hadoop fs -cp
$hadoop fs -copyFromLocal
$hadoop fs -copyToLocal
$hadoop fs -rm
$hadoop fs -rm -r
#To launch a spark shell
$ spark-shell
# Just a brief introduction on the resilient distributed dataset (RDD). RDD is the core concept within spark.
An RDD is simply a distributed collection of elements.
In Spark all work is expressed as either creating new RDDs, transforming existing
RDDs, or calling operations on RDDs to compute a result. Under the hood, Spark
automatically distributes the data contained in RDDs across your cluster and parallelizes
the operations you perform on them.  (Note: This is the RDD definition I got from a book:  Learning Spark , pretty much a good book to start learning spark)#

Next time I will try to use some practical data set to demonstrate use cases on below topics:
  1. Spark SQL + DataFrames
  2. Streaming
  3. MLlib (Machine Learning)
  4. Graph X (Graph Computation)
  5. Deep Learning Pipelines (new open source library which builds on Apache spark’s ML)

Related Posts

One thought on “Launch Spark Shell on Hortonwork Sandbox

Comments are closed.