This brief deep dive course into Elasticsearch and Spark help you understand how to perform real-time indexing, search and data-analysis. This course helps you to understand Elasticsearch as a datastore and as NoSQL, as well as the Spark processing engine.
In this course you are introduced to the Elasticsearch search engine based on the Lucene library. You learn that Elasticsearch, developed in Java, can provide you with a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is the key product of a company called ‘Elastic’ and is provides some very important benefits, including
- Developer-Friendly API
Elasticsearch is API driven. Almost any action can be performed using a simple RESTful API using JSON over HTTP. Client libraries are available for many programming languages. It has a clean and easily navigated documentation increasing the quality and user experience of independently created applications on your platform. It can be integrated with Hadoop for fast query results. Klout, website which measure social media influence uses this technique and has scale from 100 million to 400 million users, while reducing database update time from one day down to four hours, and delivering query results to the business analysts in seconds rather than minutes.
- Real-Time Analytics
Real-time analytics provides updated results of customer events, such as page views, website navigation, shopping cart use, or any other kind of online or digital activity. This data is extremely important for businesses conducting dynamic analysis and reporting in order to quickly respond to trends in user behavior. Using Elasticsearch data is immediately available for search and analytics. Elasticsearch combines the speed of search instances with the power of analytics for better decision making. It gives insights that make your business streamlined and improves your products by interactive search and other analyzing features.
- Ease of Data Indexing
Data indexing is a way of sorting a number of records on multiple fields. Elasticsearch is schema-free and document-oriented. It stores complex real world entities in Elasticsearch as structured JSON documents. Simply index a JSON document and it will automatically detect the data structure and types, create an index, and make your data searchable. You also have full control to customize how your data is indexed. It simplifies the analytics process by improving the speed of data retrieval process on a database table.
- Full-Text Search
In a full-text search, a search engine examines all of the words in every stored document as it tries to match search criteria. Elasticsearch builds distributed capabilities on top of Apache Lucene to provide the most powerful full- text search capabilities available in any open source product. Powerful, developer-friendly query API supports multilingual search, geolocation, contextual did-you-mean suggestions, autocomplete, and result-snippets.
- Resilient Clusters
Elasticsearch clusters are resilient — they will detect new or failed nodes. It will also reorganize and rebalance data automatically to ensure that your data is safe and accessible. A cluster may contain multiple indices that can be queried independently or as a group. Index aliases allow filtered views of an index and may be updated transparently to your application.
- Understand Elasticsearch as a data store
- Appreciate how to use Elasticsearch as a NoSQL
- Explain Spark as a processing engine
- Insights into using Spark and Elasticsearch as a machine learning tool
- Basic awareness of data science important.
- No other prerequisite.
Who Should Attend
- Anyone interested int data science tools and technologies