logo

Apache Spark Course

course overview

Click to View dates & book now

Overview

This course takes a detailed look at how to implement Big Data solutions using Apache Spark. The course uses the Scala programming language, although we can also run it on Python or Java if required.

On this course you'll learn:

  • Big Data principles
  • Creating and using RDDs
  • Spark Streaming
  • Spark SQL
  • Spark Machine Learning
  • Spark Graph Processing

Audience

This course is for those who need a detailed look at how to implement Big Data solutions using Apache Spark.

Prerequisites

You should have solid experience in Scala (or Python/Java).

Outline

  • Introduction to Big Data
  • Introduction to Hadoop
  • Data serialization
  • Column-based storage
  • Messaging systems
  • NoSQL
  • Distributed SQL query engine
  • Introduction to Apache Spark
  • Key features of Spark
  • Spark architecture
  • Application execution
  • Resilient Distributed Datasets
  • Spark API
  • Caching
  • Spark jobs
  • Interactive Data Analysis with Spark Shell
  • Key concepts
  • REPL commands
  • Using Scala
  • Number analysis
  • Log analysis
  • Writing Spark Applications
  • Writing a Hello world application
  • Compiling and running an application
  • Monitoring and debugging an application
  • Spark Streaming
  • Overview of Spark streaming
  • Spark streaming API
  • Creating a discretized stream
  • Processing a discretized stream
  • Output operations
  • Spark SQL
  • Overview of Spark SQL
  • Performance considerations
  • Usage scenarios
  • Spark SQL API
  • Built-in functions
  • Machine Learning with Spark
  • Overview of Machine Learning
  • Spark Machine Learning Libraries (MLllb API)
  • Spark ML
  • Graph Processing with Spark
  • Overview of graphs
  • Overview of GraphX API
  • Using GraphX API
  • Cluster Managers
  • Standalone cluster manager
  • Apache Mesos
  • YARN

Talk to an expert

Thinking about Onsite?

If you need training for 3 or more people, you should ask us about onsite training. Putting aside the obvious location benefit, content can be customised to better meet your business objectives and more can be covered than in a public classroom. Its a cost effective option. One on one training can be delivered too, at reasonable rates.

Submit an enquiry from any page on this site, and let us know you are interested in the requirements box, or simply mention it when we contact you.

All $ prices are in USD unless it’s a NZ or AU date

SPVC = Self Paced Virtual Class

LVC = Live Virtual Class

Please Note: All courses are availaible as Live Virtual Classes

Trusted by over 1/2 million students in 15 countries

Our clients have included prestigious national organisations such as Oxford University Press, multi-national private corporations such as JP Morgan and HSBC, as well as public sector institutions such as the Department of Defence and the Department of Health.