CMDBID: 74639 | Course Code: 1262 | Duration: 4 Days
Overview>
This course takes a detailed look at how to implement Big Data solutions using Apache Spark. The course uses the Scala programming language, although we can also run it on Python or Java if required.
On this course you'll learn:
Big Data principles
Creating and using RDDs
Spark Streaming
Spark SQL
Spark Machine Learning
Spark Graph Processing
Audience>
This course is for those who need a detailed look at how to implement Big Data solutions using Apache Spark.
Prerequisites>
You should have solid experience in Scala (or Python/Java).
Outline>
Introduction to Big Data
Introduction to Hadoop
Data serialization
Column-based storage
Messaging systems
NoSQL
Distributed SQL query engine
Introduction to Apache Spark
Key features of Spark
Spark architecture
Application execution
Resilient Distributed Datasets
Spark API
Caching
Spark jobs
Interactive Data Analysis with Spark Shell
Key concepts
REPL commands
Using Scala
Number analysis
Log analysis
Writing Spark Applications
Writing a Hello world application
Compiling and running an application
Monitoring and debugging an application
Spark Streaming
Overview of Spark streaming
Spark streaming API
Creating a discretized stream
Processing a discretized stream
Output operations
Spark SQL
Overview of Spark SQL
Performance considerations
Usage scenarios
Spark SQL API
Built-in functions
Machine Learning with Spark
Overview of Machine Learning
Spark Machine Learning Libraries (MLllb API)
Spark ML
Graph Processing with Spark
Overview of graphs
Overview of GraphX API
Using GraphX API
Cluster Managers
Standalone cluster manager
Apache Mesos
YARN
Our Clients
Our clients have included prestigious national organisations such as Oxford University Press, multi-national private corporations such as JP Morgan and HSBC, as well
as public sector institutions such as the Department of Defence and the Department of Health.