course overview
download outline
Overview
This four-day instructor-led class provides participants a hands-on introduction to designing and building data processing systems on Google Cloud Platform. Through a combination of presentations, demos, and hand-on labs, participants will learn how to design data processing systems, build end-to-end data pipelines, analyze data and carry out machine learning. The course covers structured, unstructured, and streaming data.
This class is intended for experienced developers who are responsible for managing big data transformations including: Extracting, Loading, Transforming, cleaning, and validating data Designing pipelines and architectures for data processing Creating and maintaining machine learning and statistical models Querying datasets, visualizing query results and creating reports To get the most of out of this course, participants should have: Completed Google Cloud Fundamentals: Big Data & Machine Learning course OR have equivalent experience Basic proficiency with common query language such as SQL Experience with data modeling, extract, transform, load activities Developing applications using a common programming language such as Python Familiarity with Machine Learning and/or statistics
Audience
This course teaches participants the following skills: Design and build data processing systems on Google Cloud Platform Leverage unstructured data using Spark and ML APIs on Cloud Dataproc Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow Derive business insights from extremely large datasets using Google BigQuery Train, evaluate and predict using machine learning models using TensorFlow and Cloud ML Enable instant insights from streaming data
Prerequisites
To get the most of out of this course, participants should have: Completed: Google Cloud Fundamentals: Core Infrastructure (GCPFCI) course OR have equivalent experience. Basic proficiency with common query language such as SQL Experience with data modeling, extract, transform, load activities Developing applications using a common programming language such as Python Familiarity with basic statistics
Outline
Module 1: Introduction to Data Engineering
Module 2: Building a Data Lake
Module 3: Building a Data Warehouse
Module 4: Introduction to Building Batch Data Pipelines
Module 5: Executing Spark on Cloud Dataproc
Module 6: Serverless Data Processing with Cloud Dataflow
Module 7: Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
Module 8: Introduction to Processing Streaming Data
Module 9: Serverless Messaging with Cloud Pub/Sub
Module 10: Cloud Dataflow Streaming Features
Module 11: High-Throughput BigQuery and Bigtable Streaming Features
Module 12: Advanced BigQuery Functionality and Performance
Module 13: Introduction to Analytics and AI
Module 14: Prebuilt ML model APIs for Unstructured Data
Module 15: Big Data Analytics with Cloud AI Platform Notebooks
Module 16: Production ML Pipelines with Kubeflow
Module 17: Custom Model building with SQL in BigQuery ML
Module 18: Custom Model building with Cloud AutoMLW
If you need training for 3 or more people, you should ask us about onsite training. Putting aside the obvious location benefit, content can be customised to better meet your business objectives and more can be covered than in a public classroom. Its a cost effective option. One on one training can be delivered too, at reasonable rates.
Submit an enquiry from any page on this site and let us know you are interested in the requirements box, or simply mention it when we contact you.
All $ prices are in USD unless it’s a NZ or AU date
SPVC = Self Paced Virtual Class
LVC = Live Virtual Class
Our clients have included prestigious national organisations such as Oxford University Press, multi-national private corporations such as JP Morgan and HSBC, as well as public sector institutions such as the Department of Defence and the Department of Health.