logo

Apache Course

course overview

Click to View dates & book now

Overview

This four-day administrator course provides the technical background you need to manage and scale a Cloudera cluster in a development or production environment. This is the core administrator learning path curriculum.

Take your knowledge to the next level with Cloudera's Administrator Training and Certification. This four-day administrator training course provides participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster using Cloudera Manager. From installation and configuration through load balancing and tuning, this training course is the best preparation for the real-world challenges faced by Cloudera administrators.

Administrators who earn CCA Administrator Certification by sitting an exam subsequent to this course, have demonstrated technical knowledge in the ability to configure, deploy, maintain, and secure an Apache Hadoop cluster.

Audience

This course is best suited to:

  • systems administrators
  • IT managers

Skills Gained

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning skills such as:

  • The internals of YARN, MapReduce, and HDFS
  • Determining the correct hardware and infrastructure for your cluster
  • Proper cluster configuration and deployment to integrate with the data center
  • How to load data into the cluster from dynamically-generated files using Flume and from RDBMS using Sqoop
  • Configuring the FairScheduler to provide service-level agreements for multiple users of a cluster
  • Best practices for preparing and maintaining Apache Hadoop in production
  • Troubleshooting, diagnosing, tuning, and solving Hadoop issues

Prerequisites

Participants should have basic Linux experience. Prior knowledge of Apache Hadoop is not required.

The supply of this course by DDLS is governed by the booking terms and conditions. Please read the terms and conditions carefully before enrolling in this course, as enrolment in the course is conditional on acceptance of these terms and conditions.

Outline

  • The Case for Apache Hadoop
  • Hadoop Cluster Installation
  • The Hadoop Distributed File System (HDFS)
  • MapReduce and Spark on YARN
  • Hadoop Configuration and Daemon Logs
  • Getting Data Into HDFS
  • Planning Your Hadoop Cluster
  • Installing and Configuring Hive, Impala, and Pig
  • Hadoop Clients Including Hue
  • Advanced Cluster Configuration
  • Hadoop Security
  • Managing Resources
  • Cluster Maintenance
  • Cluster Monitoring and Troubleshooting

Introduction

  • Why Hadoop?
  • Fundamental Concepts
  • Core Hadoop Components
  • Rationale for a Cluster Management Solution
  • Cloudera Manager Features
  • Cloudera Manager Installation
  • Hadoop (CDH) Installation
  • HDFS Features
  • Writing and Reading Files
  • NameNode Memory Considerations
  • Overview of HDFS Security
  • Web UIs for HDFS
  • Using the Hadoop File Shell
  • The Role of Computational Frameworks
  • YARN: The Cluster Resource Manager
  • MapReduce Concepts
  • Apache Spark Concepts
  • Running Computational Frameworks on YARN
  • Exploring YARN Applications Through the Web UIs, and the Shell
  • YARN Application Logs
  • Cloudera Manager Constructs for Managing Configurations
  • Locating Configurations and Applying Configuration Changes
  • Managing Role Instances and Adding Services
  • Configuring the HDFS Service
  • Configuring Hadoop Daemon Logs
  • Configuring the YARN Service
  • Ingesting Data From External Sources With Flume
  • Ingesting Data From Relational Databases With Sqoop
  • REST Interfaces
  • Best Practices for Importing Data
  • General Planning Considerations
  • Choosing the Right Hardware
  • Virtualisation Options
  • Network Considerations
  • Configuring Nodes
  • Hive
  • Impala
  • Pig
  • What Are Hadoop Clients?
  • Installing and Configuring Hadoop Clients
  • Installing and Configuring Hue
  • Hue Authentication and Authorisation
  • Advanced Configuration Parameters
  • Configuring Hadoop Ports
  • Configuring HDFS for Rack Awareness
  • Configuring HDFS High Availability
  • Why Hadoop Security Is Important
  • Hadoop's Security System Concepts
  • What Kerberos Is and how it Works
  • Securing a Hadoop Cluster With Kerberos
  • Other Security Concepts
  • Configuring cgroups with Static Service Pools
  • The Fair Scheduler
  • Configuring Dynamic Resource Pools
  • YARN Memory and CPU Settings
  • Impala Query Scheduling
  • Checking HDFS Status
  • Copying Data Between Clusters
  • Adding and Removing Cluster Nodes
  • Rebalancing the Cluster
  • Directory Snapshots
  • Cluster Upgrading
  • Cloudera Manager Monitoring Features
  • Monitoring Hadoop Clusters
  • Troubleshooting Hadoop Clusters
  • Common Misconfigurations

Conclusion

Talk to an expert

Thinking about Onsite?

If you need training for 3 or more people, you should ask us about onsite training. Putting aside the obvious location benefit, content can be customised to better meet your business objectives and more can be covered than in a public classroom. Its a cost effective option. One on one training can be delivered too, at reasonable rates.

Submit an enquiry from any page on this site, and let us know you are interested in the requirements box, or simply mention it when we contact you.

All $ prices are in USD unless it’s a NZ or AU date

SPVC = Self Paced Virtual Class

LVC = Live Virtual Class

Please Note: All courses are availaible as Live Virtual Classes

Trusted by over 1/2 million students in 15 countries

Our clients have included prestigious national organisations such as Oxford University Press, multi-national private corporations such as JP Morgan and HSBC, as well as public sector institutions such as the Department of Defence and the Department of Health.