Whether you're a total beginner at Spark or a more seasoned vet, this hands-on data engineering workshop, led by Austin Ouyang, will teach you the critical fundamentals of Apache's Spark framework from machine learning pipelines to SparkSQL before jumping into practical, hands-on tutorials in part two.

In this workshop, you will learn:

  • Apache Spark framework
  • RDDs
  • Transformations and Actions
  • Directed Acyclic Graph (DAG)
  • Jobs, Stages and Tasks
  • Ways to run Spark - local vs standalone
  • Datasets
  • DataFrames
  • Machine Learning Pipelines
  • SparkSQL

Level:

Beginner - Intermediate

Prerequisites:

What you should know (or have pre-installed) to get the most value.

  • A machine with a unix based OS or a virtual environment supporting one
  • Familiarity with command line
  • Some experience programming in Python
  • All hands-on examples and projects will be executed on distributed Spark clusters on AWS and the environment will be pre-configured for everyone

Meet Your Instructor:

Austin Ouyang | Lead Platform Engineer | Insight Data Science

Austin OuyangThe workshop is lead by Austin Ouyang, Lead Platform Engineer at Insight Data Science. He is currently leading the efforts in building out a microservices architecture using Kubernetes on AWS for Insight’s internal tools and services. He has been a lead mentor for Apache Spark Workshops helping Data Scientists and Engineers learn the fundamentals of Spark and write performant Spark jobs. He joined Insight as a Data Engineering Fellow and then co-led the Data Engineering Fellows program for a year.

 

 

New Call-to-action