Normal view MARC view ISBD view

Data-Intensive Systems : Principles and Fundamentals Using Hadoop and Spark.

By: Wiktorski, Tomasz.
Material type: TextTextSeries: eBooks on Demand.Advanced Information and Knowledge Processing Ser: Publisher: Cham : Springer, 2019Copyright date: ©2019Description: 1 online resource (105 pages).Content type: text Media type: computer Carrier type: online resourceISBN: 9783030046033.Subject(s): Apache Hadoop | Big dataGenre/Form: Electronic books.Additional physical formats: Print version:: Data-Intensive Systems : Principles and Fundamentals Using Hadoop and SparkDDC classification: 004.36 LOC classification: QA75.5-76.95Online resources: Click here to view this ebook.
Contents:
Intro -- Contents -- List of Figures -- List of Listings -- 1 Preface -- 1.1 Conventions Used in this Book -- 1.2 Listed Code -- 1.3 Terminology -- 1.4 Examples and Exercises -- 2 Introduction -- 2.1 Growing Datasets -- 2.2 Hardware Trends -- 2.3 The V's of Big Data -- 2.4 NOSQL -- 2.5 Data as the Fourth Paradigm of Science -- 2.6 Example Applications -- 2.6.1 Data Hub -- 2.6.2 Search and Recommendations -- 2.6.3 Retail Optimization -- 2.6.4 Healthcare -- 2.6.5 Internet of Things -- 2.7 Main Tools -- 2.7.1 Hadoop -- 2.7.2 Spark -- 2.8 Exercises -- References -- 3 Hadoop 101 and Reference Scenario -- 3.1 Reference Scenario -- 3.2 Hadoop Setup -- 3.3 Analyzing Unstructured Data -- 3.4 Analyzing Structured Data -- 3.5 Exercises -- 4 Functional Abstraction -- 4.1 Functional Programming Overview -- 4.2 Functional Abstraction for Data Processing -- 4.3 Functional Abstraction and Parallelism -- 4.4 Lambda Architecture -- 4.5 Exercises -- Reference -- 5 Introduction to MapReduce -- 5.1 Reference Code -- 5.2 Map Phase -- 5.3 Combine Phase -- 5.4 Shuffle Phase -- 5.5 Reduce Phase -- 5.6 Embarrassingly Parallel Problems -- 5.7 Running MapReduce Programs -- 5.8 Exercises -- 6 Hadoop Architecture -- 6.1 Architecture Overview -- 6.2 Data Handling -- 6.2.1 HDFS Architecture -- 6.2.2 Read Flow -- 6.2.3 Write Flow -- 6.2.4 HDFS Failovers -- 6.3 Job Handling -- 6.3.1 Job Flow -- 6.3.2 Data Locality -- 6.3.3 Job and Task Failures -- 6.4 Exercises -- 7 MapReduce Algorithms and Patterns -- 7.1 Counting, Summing, and Averaging -- 7.2 Search Assist -- 7.3 Random Sampling -- 7.4 Multiline Input -- 7.5 Inverted Index -- 7.6 Exercises -- References -- 8 NOSQL Databases -- 8.1 NOSQL Overview and Examples -- 8.1.1 CAP and PACELC Theorem -- 8.2 HBase Overview -- 8.3 Data Model -- 8.4 Architecture -- 8.4.1 Regions -- 8.4.2 HFile, HLog, and Memstore.
8.4.3 Region Server Failover -- 8.5 MapReduce and HBase -- 8.5.1 Loading Data -- 8.5.2 Running Queries -- 8.6 Exercises -- References -- 9 Spark -- 9.1 Motivation -- 9.2 Data Model -- 9.2.1 Resilient Distributed Datasets and DataFrames -- 9.2.2 Other Data Structures -- 9.3 Programming Model -- 9.3.1 Data Ingestion -- 9.3.2 Basic Actions-Count, Take, and Collect -- 9.3.3 Basic Transformations-Filter, Map, and reduceByKey -- 9.3.4 Other Operations-flatMap and Reduce -- 9.4 Architecture -- 9.5 SparkSQL -- 9.6 Exercises.
Tags from this library: No tags from this library for this title. Log in to add tags.
Item type Current location Call number URL Status Date due Barcode
Electronic Book UT Tyler Online
Online
QA75.5-76.95 (Browse shelf) https://ebookcentral.proquest.com/lib/uttyler/detail.action?docID=5628157 Available EBC5628157

Intro -- Contents -- List of Figures -- List of Listings -- 1 Preface -- 1.1 Conventions Used in this Book -- 1.2 Listed Code -- 1.3 Terminology -- 1.4 Examples and Exercises -- 2 Introduction -- 2.1 Growing Datasets -- 2.2 Hardware Trends -- 2.3 The V's of Big Data -- 2.4 NOSQL -- 2.5 Data as the Fourth Paradigm of Science -- 2.6 Example Applications -- 2.6.1 Data Hub -- 2.6.2 Search and Recommendations -- 2.6.3 Retail Optimization -- 2.6.4 Healthcare -- 2.6.5 Internet of Things -- 2.7 Main Tools -- 2.7.1 Hadoop -- 2.7.2 Spark -- 2.8 Exercises -- References -- 3 Hadoop 101 and Reference Scenario -- 3.1 Reference Scenario -- 3.2 Hadoop Setup -- 3.3 Analyzing Unstructured Data -- 3.4 Analyzing Structured Data -- 3.5 Exercises -- 4 Functional Abstraction -- 4.1 Functional Programming Overview -- 4.2 Functional Abstraction for Data Processing -- 4.3 Functional Abstraction and Parallelism -- 4.4 Lambda Architecture -- 4.5 Exercises -- Reference -- 5 Introduction to MapReduce -- 5.1 Reference Code -- 5.2 Map Phase -- 5.3 Combine Phase -- 5.4 Shuffle Phase -- 5.5 Reduce Phase -- 5.6 Embarrassingly Parallel Problems -- 5.7 Running MapReduce Programs -- 5.8 Exercises -- 6 Hadoop Architecture -- 6.1 Architecture Overview -- 6.2 Data Handling -- 6.2.1 HDFS Architecture -- 6.2.2 Read Flow -- 6.2.3 Write Flow -- 6.2.4 HDFS Failovers -- 6.3 Job Handling -- 6.3.1 Job Flow -- 6.3.2 Data Locality -- 6.3.3 Job and Task Failures -- 6.4 Exercises -- 7 MapReduce Algorithms and Patterns -- 7.1 Counting, Summing, and Averaging -- 7.2 Search Assist -- 7.3 Random Sampling -- 7.4 Multiline Input -- 7.5 Inverted Index -- 7.6 Exercises -- References -- 8 NOSQL Databases -- 8.1 NOSQL Overview and Examples -- 8.1.1 CAP and PACELC Theorem -- 8.2 HBase Overview -- 8.3 Data Model -- 8.4 Architecture -- 8.4.1 Regions -- 8.4.2 HFile, HLog, and Memstore.

8.4.3 Region Server Failover -- 8.5 MapReduce and HBase -- 8.5.1 Loading Data -- 8.5.2 Running Queries -- 8.6 Exercises -- References -- 9 Spark -- 9.1 Motivation -- 9.2 Data Model -- 9.2.1 Resilient Distributed Datasets and DataFrames -- 9.2.2 Other Data Structures -- 9.3 Programming Model -- 9.3.1 Data Ingestion -- 9.3.2 Basic Actions-Count, Take, and Collect -- 9.3.3 Basic Transformations-Filter, Map, and reduceByKey -- 9.3.4 Other Operations-flatMap and Reduce -- 9.4 Architecture -- 9.5 SparkSQL -- 9.6 Exercises.

Description based on publisher supplied metadata and other sources.

There are no comments for this item.

Log in to your account to post a comment.