Apache Spark in the Cloud
Corso
Online
Hai bisogno di un coach per la formazione?
Ti aiuterà a confrontare vari corsi e trovare l'offerta formativa più conveniente.
Descrizione
-
Tipologia
Corso
-
Metodologia
Online
-
Inizio
Scegli data
Apache Spark's learning curve is slowly increasing at the begining, it needs a lot of effort to get the first return. This course aims to jump through the first tough part. After taking this course the participants will understand the basics of Apache Spark , they will clearly differentiate RDD from DataFrame, they will learn Python and Scala API, they will understand executors and tasks, etc. Also following the best practices, this course strongly focuses on cloud deployment, Databricks and AWS. The students will also understand the differences between AWS EMR and AWS Glue, one of the lastest Spark service of AWS.
AUDIENCE:
Data Engineer, DevOps, Data Scientist
Sedi e date
Luogo
Inizio del corso
Inizio del corso
Profilo del corso
Programing skills (preferably python, scala)
SQL basics
Opinioni
Materie
- Api
- Python
- Apache
Programma
Introduction:
- Apache Spark in Hadoop Ecosystem
- Short intro for python, scala
Basics (theory):
- Architecture
- RDD
- Transformation and Actions
- Stage, Task, Dependencies
Using Databricks environment understand the basics (hands-on workshop):
- Exercises using RDD API
- Basic action and transformation functions
- PairRDD
- Join
- Caching strategies
- Exercises using DataFrame API
- SparkSQL
- DataFrame: select, filter, group, sort
- UDF (User Defined Function)
- Looking into DataSet API
- Streaming
Using AWS environment understand the deployment (hands-on workshop):
- Basics of AWS Glue
- Understand differencies between AWS EMR and AWS Glue
- Example jobs on both environment
- Understand pros and cons
Extra:
- Introduction to Apache Airflow orchestration
Hai bisogno di un coach per la formazione?
Ti aiuterà a confrontare vari corsi e trovare l'offerta formativa più conveniente.
Apache Spark in the Cloud