logo

Linux Course

course overview

Click to View dates & book now

Overview

This course covers the essentials of deploying and managing an Apache Hadoop cluster. The course is lab intensive with each participant creating their own Hadoop cluster using either the CDH (Cloudera's Distribution, including Apache Hadoop) or Hortonworks Data Platform stacks. Core Hadoop services are explored in depth with emphasis on troubleshooting and recovering from common cluster failures. The fundamentals of related services such as Ambari, Zookeeper, Pig, Hive, HBase, Sqoop, Flume, and Oozie are also covered. The course is approximately 60% lecture and 40% labs.

Version: D05

Prerequisites

Prerequisites:

Qualified participants should be comfortable with the Linux commands and have some systems administration experience, but do not need previous Hadoop experience

Supported Distributions: Red Hat Enterprise Linux 7

Outline

  1. HDFS
    1. Design Goals
    2. Design
    3. Blocks
    4. Block Replication
    5. Namenode Daemon
    6. Secondary Namenode Daemon
    7. Datanode Daemon
    8. Accessing HDFS
    9. Permissions and Users
    10. Adding and Removing Datanodes
    11. Balancing
    Lab Tasks
    1. Single Node HDFS
    2. Multi-node HDFS
    3. Files and HDFS
    4. Managing and Maintaining HDFS
  2. YARN
    1. YARN Design Goals
    2. YARN Architecture
    3. Resource Manager
    4. Node Manager
    5. Containers
    6. YARN: Other Important Features
    7. Slider
    Lab Tasks
    1. YARN
  3. MapReduce
    1. MapReduce
    2. Terminology and Data Flow
    Lab Tasks
    1. Mapreduce
  4. Installing Hadoop with Ambari Lab Tasks
    1. CDH Uninstall
    2. Installing Hadoop with Ambari
    3. Tez
  5. Data Ingestion
    1. Sqoop
    2. Flume
    3. Kafka
    Lab Tasks
    1. Sqoop
  6. Data Lineage and Governance
    1. Falcon
    2. Atlas
    3. Oozie
  7. Data Processing Frameworks
    1. The Bane of MapReduce
    2. Tez overview
    3. Pig
    4. Hive
    5. Spark
    6. Storm
    7. Solr
    Lab Tasks
    1. Pig
  8. NoSQL Implementations
    1. HBase
    2. Phoenix
  9. Cluster Management
    1. Ambari Metrics System (AMS)
    2. Zookeeper

Talk to an expert

Thinking about Onsite?

If you need training for 3 or more people, you should ask us about onsite training. Putting aside the obvious location benefit, content can be customised to better meet your business objectives and more can be covered than in a public classroom. Its a cost effective option. One on one training can be delivered too, at reasonable rates.

Submit an enquiry from any page on this site, and let us know you are interested in the requirements box, or simply mention it when we contact you.

All $ prices are in USD unless it’s a NZ or AU date

SPVC = Self Paced Virtual Class

LVC = Live Virtual Class

Please Note: All courses are availaible as Live Virtual Classes

Trusted by over 1/2 million students in 15 countries

Our clients have included prestigious national organisations such as Oxford University Press, multi-national private corporations such as JP Morgan and HSBC, as well as public sector institutions such as the Department of Defence and the Department of Health.