Trk 1 Pt 1: Intro to Spark Fundamentals [Workshop]

Whether you're a total beginner at Spark or a more seasoned vet, this hands-on data engineering workshop, led by Austin Ouyang, will teach you the critical fundamentals of Apache's Spark framework from machine learning pipelines to SparkSQL before jumping into practical, hands-on tutorials in part two.

In this workshop, you will learn:

Apache Spark framework
RDDs
Transformations and Actions
Directed Acyclic Graph (DAG)
Jobs, Stages and Tasks
Ways to run Spark - local vs standalone
Datasets
DataFrames
Machine Learning Pipelines
SparkSQL

Level:

Beginner - Intermediate

Prerequisites:

What you should know (or have pre-installed) to get the most value.

A machine with a unix based OS or a virtual environment supporting one
Familiarity with command line
Some experience programming in Python
All hands-on examples and projects will be executed on distributed Spark clusters on AWS and the environment will be pre-configured for everyone

Meet Your Instructor:

Austin Ouyang | Lead Platform Engineer | Insight Data Science

The workshop is lead by Austin Ouyang, Lead Platform Engineer at Insight Data Science. He is currently leading the efforts in building out a microservices architecture using Kubernetes on AWS for Insight’s internal tools and services. He has been a lead mentor for Apache Spark Workshops helping Data Scientists and Engineers learn the fundamentals of Spark and write performant Spark jobs. He joined Insight as a Data Engineering Fellow and then co-led the Data Engineering Fellows program for a year.

Workshop Pt. 1: Intro to Spark Fundamentals

(Track 1: Spark Fundamentals & Beyond)

Level:

Prerequisites:

Meet Your Instructor:

Austin Ouyang | Lead Platform Engineer | Insight Data Science

Discover new data events & more

About DataEngConf

Learn More

Connect With Us

#DataEngConf