#To launch Spark Shell on Hortonwork Sandbox, just type in below in your browser:
http://Host:4200/
#Login details:
username: root/maria_dev
password: hadoop/maria_dev
#The focus of learning is spark, but over here just to list some commonly used Hadoop commands:
$hadoop fs -help
$hadoop fs -ls
$hadoop fs -cat
$hadoop fs -get
$hadoop fs -put
$hadoop fs -mkdir
$hadoop fs -cp
$hadoop fs -copyFromLocal
$hadoop fs -copyToLocal
$hadoop fs -rm
$hadoop fs -rm -r
#To launch a spark shell
$ spark-shell
# Just a brief introduction on the resilient distributed dataset (RDD). RDD is the core concept within spark.
An RDD is simply a distributed collection of elements.
In Spark all work is expressed as either creating new RDDs, transforming existing
RDDs, or calling operations on RDDs to compute a result. Under the hood, Spark
automatically distributes the data contained in RDDs across your cluster and parallelizes
the operations you perform on them. (Note: This is the RDD definition I got from a book: Learning Spark , pretty much a good book to start learning spark)#
RDDs, or calling operations on RDDs to compute a result. Under the hood, Spark
automatically distributes the data contained in RDDs across your cluster and parallelizes
the operations you perform on them. (Note: This is the RDD definition I got from a book: Learning Spark , pretty much a good book to start learning spark)#
Next time I will try to use some practical data set to demonstrate use cases on below topics:
- Spark SQL + DataFrames
- Streaming
- MLlib (Machine Learning)
- Graph X (Graph Computation)
- Deep Learning Pipelines (new open source library which builds on Apache spark’s ML)
It was so nice article. I was really satisfied by seeing this article. Power BI Online course