Designing and Building Big Data Applications
Enrolled You have 10 weeks remaining for the course
DESCRIPTION
This four day training for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH).
You will work through the entire process of designing and building solutions, including ingesting data, determining the appropriate file format for storage, processing the stored data, and presenting the results to the end-user in an easy-to-digest form. Go beyond MapReduce to use additional elements of the EDH and develop converged applications that are highly relevant to the business.
TARGET AUDIENCE
- This course is best suited to developers, engineers, and architects who want to use use Hadoop and related tools to solve real-world problems. Participants should have already attended Cloudera Developer Training for Apache Hadoop or have equivalent practical experience. Good knowledge of Java and basic familiarity with Linux are required. Experience with SQL is helpful
OBJECTIVES
At the end of the course, students will be able to:
- Creating a data set with Kite SDK
- Developing custom Flume components for data ingestion
- Managing a multi-stage workflow with Oozie
- Analyzing data with Crunch
- Writing user-defined functions for Hive and Impala
- Transforming data with Morphlines
- Indexing data with Cloudera Search
- Scenario Explanation
- Understanding the Development Environment
- Identifying and Collecting Input Data
- Selecting Tools for Data Processing and Analysis
- Presenting Results to the Use
- Metadata Management
- What is Apache Avro?
- Avro Schemas
- Avro Schema Evolution
- Selecting a File Format
- Performance Considerations
- What is the Kite SDK?
- Fundamental Data Module Concepts
- Creating New Data Sets Using the Kite SDK
- Loading, Accessing, and Deleting a Data Set
- What is Apache Sqoop?
- Basic Imports
- Limiting Results
- Improving Sqoop--s Performance
- Sqoop 2
- What is Apache Flume?
- Basic Flume Architecture
- Flume Sources
- Flume Sinks
- Flume Configuration
- Logging Application Events to Hadoop
- Flume Data Flow and Common Extension Points
- Custom Flume Sources
- Developing a Flume Pollable Source
- Developing a Flume Event-Driven Source
- Custom Flume Interceptors
- Developing a Header-Modifying Flume Interceptor
- Developing a Filtering Flume Interceptor
- Writing Avro Objects with a Custom Flume Interceptor
- The Need for Workflow Management
- What is Apache Oozie?
- Defining an Oozie Workflow
- Validation, Packaging, and Deployment
- Running and Tracking Workflows Using the CLI
- Hue UI for Oozie
- What is Apache Crunch?
- Understanding the Crunch Pipeline
- Comparing Crunch to Java MapReduce
- Working with Crunch Projects
- Reading and Writing Data in Crunch
- Data Collection API Functions
- Utility Classes in the Crunch API
- What is Apache Hive?
- Accessing Hive
- Basic Query Syntax
- Creating and Populating Hive Tables
- How Hive Reads Data
- Using the RegexSerDe in Hive
- What are User-Defined Functions?
- Implementing a User-Defined Function
- Deploying Custom Libraries in Hive
- Registering a User-Defined Function in Hive
- What is Impala?
- Comparing Hive to Impala
- Running Queries in Impala
- Support for User-Defined Functions
- Data and Metadata Management
- What is Cloudera Search?
- Search Architecture
- Supported Document Formats
- Collection and Schema Management
- Morphlines
- Indexing Data in Batch Mode
- Indexing Data in Near Real Time
- Solr Query Syntax
- Building a Search UI with Hue
- Accessing Impala through JDBC
- Powering a Custom Web Application with Impala and Search
You May Like
Python for Data Science & Machine Learning – Certification Boot Camp
DESCRIPTIONThis four day training for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH).You will work through the entire process of designing and building solutions, including ingesting data, determining the appropriate file format for storage, processing the stored data, and presenting the results
More DetailsEnquire NowImmersion Certification in Continuous Integration and Development Tools
DESCRIPTIONThis four day training for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH).You will work through the entire process of designing and building solutions, including ingesting data, determining the appropriate file format for storage, processing the stored data, and presenting the results
More DetailsEnquire NowAzure Cloud Architect Immersion Certification Program
DESCRIPTIONThis four day training for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH).You will work through the entire process of designing and building solutions, including ingesting data, determining the appropriate file format for storage, processing the stored data, and presenting the results
More DetailsEnquire Now