Whether you're a total beginner at Spark or a more seasoned vet, this hands-on data engineering workshop, led by Austin Ouyang, will teach you the critical fundamentals of Apache's Spark framework from machine learning pipelines to SparkSQL before jumping into practical, hands-on tutorials in part two.
In this workshop, you will learn:
- Apache Spark framework
- RDDs
- Transformations and Actions
- Directed Acyclic Graph (DAG)
- Jobs, Stages and Tasks
- Ways to run Spark - local vs standalone
- Datasets
- DataFrames
- Machine Learning Pipelines
- SparkSQL
Level:
Beginner - Intermediate
Prerequisites:
What you should know (or have pre-installed) to get the most value.
- A machine with a unix based OS or a virtual environment supporting one
- Familiarity with command line
- Some experience programming in Python
- All hands-on examples and projects will be executed on distributed Spark clusters on AWS and the environment will be pre-configured for everyone
Meet Your Instructor:
Austin Ouyang | Lead Platform Engineer | Insight Data Science
The workshop is lead by Austin Ouyang, Lead Platform Engineer at Insight Data Science. He is currently leading the efforts in building out a microservices architecture using Kubernetes on AWS for Insight’s internal tools and services. He has been a lead mentor for Apache Spark Workshops helping Data Scientists and Engineers learn the fundamentals of Spark and write performant Spark jobs. He joined Insight as a Data Engineering Fellow and then co-led the Data Engineering Fellows program for a year.


