Spark V2 For Developers
Overview
This course will introduce Apache Spark. The students will learn how to use Spark for data analysis and write Spark applications.
Completely updated for latest Spark version 2.x!
Spark version 2 has lots of changes compared to v1. This course covers the latest Spark v2 features.
Objective
Learn Spark eco-system
What You Will Learn
- Spark Shell
- Spark internals
- Spark Data structures : RDDs, Dataframes, Datasets
- Spark APIs
- Spark SQL
- Spark and Hadoop
- Spark MLLib
- Spark Graphx
- Spark streaming
Audience
Developers / Data Analysts
Prerequisites
- Familiarity with either Java / Scala / Python language (our labs in Scala and Python – we provide a quick Scala introduction)
- Basic understanding of Linux development environment (command line navigation / running commands)
Lab Environment
We provide the complete lab environment in the cloud. No need to install Spark on your laptop.
Detailed Outline
- Scala primer
- A quick introduction to Scala
- Labs : Getting know Scala
- Spark Basics
- Big Data, Hadoop, Spark
- What’s new in Spark v2
- Spark concepts and architecture
- Spark eco system (core, spark sql, mlib, streaming)
- Labs : Installing and running Spark
- Spark Shell
- Spark shell
- Spark web UIs
- Analyzing dataset – part 1
- Labs: Spark shell exploration
- RDDs (Condensed coverage)
- RDDs concepts
- RDD Operations / transformations
- Labs : Unstructured data analytics using RDDs
- Data model concepts
- Partitions
- Distributed processing
- Failure handling
- Caching and persistence
- Spark Dataframes & Datasets
- Intro to Dataframe / Dataset
- Programming in Dataframe / Dataset API
- Loading structured data using Dataframes
- Labs : Dataframes, Datasets, Caching
- Spark SQL
- Spark SQL concepts and overview
- Defining tables and importing datasets
- Querying data using SQL
- Handling various storage formats : JSON / Parquet / ORC
- Labs : querying structured data using SQL; evaluating data formats
- Spark API programming (Scala / Python)
- Introduction to Spark API
- Submitting the first program to Spark
- Debugging / logging
- Configuration properties
- Labs : Programming in Spark API, Submitting jobs
- Spark and Hadoop
- Hadoop Primer : HDFS / YARN
- Hadoop + Spark architecture
- Running Spark on YARN
- Processing HDFS files using Spark
- Spark & Hive
- Machine Learning (ML / MLib)
- Machine Learning primer
- Machine Learning in Spark : MLib / ML
- Spark ML overview (newer Spark2 version)
- Algorithms : Clustering, Classifications, Recommendations
- Labs : Writing ML applications in Spark
- GraphX
- GraphX library overview
- GraphX APIs
- Labs : Processing graph data using Spark
- Spark Streaming
- Streaming concepts
- Evaluating Streaming platforms
- Spark streaming library overview
- Streaming operations
- Sliding window operations
- Structured Streaming
- Continuous streaming
- Spark & Kafka streaming
- Labs : Writing spark streaming applications
- Spark in the real world
- Highlight some Spark use cases in real world
You May Like
Python for Data Science & Machine Learning – Certification Boot Camp
Overview This course will introduce Apache Spark. The students will learn how to use Spark for data analysis and write Spark applications.Completely updated for latest Spark version 2.x! Spark version 2 has lots of changes compared to v1. This course covers the latest Spark v2 features. Objective Learn Spark eco-system What You Will LearnSpark Shell Spark internals Spark Data structures : RDDs, Dataframes, Datasets
More DetailsEnquire NowImmersion Certification in Continuous Integration and Development Tools
Overview This course will introduce Apache Spark. The students will learn how to use Spark for data analysis and write Spark applications.Completely updated for latest Spark version 2.x! Spark version 2 has lots of changes compared to v1. This course covers the latest Spark v2 features. Objective Learn Spark eco-system What You Will LearnSpark Shell Spark internals Spark Data structures : RDDs, Dataframes, Datasets
More DetailsEnquire NowAzure Cloud Architect Immersion Certification Program
Overview This course will introduce Apache Spark. The students will learn how to use Spark for data analysis and write Spark applications.Completely updated for latest Spark version 2.x! Spark version 2 has lots of changes compared to v1. This course covers the latest Spark v2 features. Objective Learn Spark eco-system What You Will LearnSpark Shell Spark internals Spark Data structures : RDDs, Dataframes, Datasets
More DetailsEnquire Now