Healey, Christopher G.

Disk-Based Algorithms for Big Data. - 1 online resource (205 pages) - eBooks on Demand .

Cover -- Half Title -- Title Page -- Copyright Page -- Dedication -- Table of Contents -- List of Tables -- List of Figures -- Preface -- Chapter 1: Physical Disk Storage -- 1.1 PHYSICAL HARD DISK -- 1.2 CLUSTERS -- 1.2.1 Block Allocation -- 1.3 ACCESS COST -- 1.4 LOGICAL TO PHYSICAL -- 1.5 BUFFER MANAGEMENT -- Chapter 2: File Management -- 2.1 LOGICAL COMPONENTS -- 2.1.1 Positioning Components -- 2.2 IDENTIFYING RECORDS -- 2.2.1 Secondary Keys -- 2.3 SEQUENTIAL ACCESS -- 2.3.1 Improvements -- 2.4 DIRECT ACCESS -- 2.4.1 Binary Search -- 2.5 FILE MANAGEMENT -- 2.5.1 Record Deletion -- 2.5.2 Fixed-Length Deletion -- 2.5.3 Variable-Length Deletion -- 2.6 FILE INDEXING -- 2.6.1 Simple Indices -- 2.6.2 Index Management -- 2.6.3 Large Index Files -- 2.6.4 Secondary Key Index -- 2.6.5 Secondary Key Index Improvements -- Chapter 3: Sorting -- 3.1 HEAPSORT -- 3.2 MERGESORT -- 3.3 TIMSORT -- Chapter 4: Searching -- 4.1 LINEAR SEARCH -- 4.2 BINARY SEARCH -- 4.3 BINARY SEARCH TREE -- 4.4 k-d TREE -- 4.4.1 k-d Tree Index -- 4.4.2 Search -- 4.4.3 Performance -- 4.5 HASHING -- 4.5.1 Collisions -- 4.5.2 Hash Functions -- 4.5.3 Hash Value Distributions -- 4.5.4 Estimating Collisions -- 4.5.5 Managing Collisions -- 4.5.6 Progressive Overflow -- 4.5.7 Multirecord Buckets -- Chapter 5: Disk-Based Sorting -- 5.1 DISK-BASED MERGESORT -- 5.1.1 Basic Mergesort -- 5.1.2 Timing -- 5.1.3 Scalability -- 5.2 INCREASED MEMORY -- 5.3 MORE HARD DRIVES -- 5.4 MULTISTEP MERGE -- 5.5 INCREASED RUN LENGTHS -- 5.5.1 Replacement Selection -- 5.5.2 Average Run Size -- 5.5.3 Cost -- 5.5.4 Dual Hard Drives -- Chapter 6: Disk-Based Searching -- 6.1 IMPROVED BINARY SEARCH -- 6.1.1 Self-Correcting BSTs -- 6.1.2 Paged BSTs -- 6.2 B-TREE -- 6.2.1 Search -- 6.2.2 Insertion -- 6.2.3 Deletion -- 6.3 B* TREE -- 6.4 B+ TREE -- 6.4.1 Prefix Keys -- 6.5 EXTENDIBLE HASHING -- 6.5.1 Trie. 6.5.2 Radix Tree -- 6.6 HASH TRIES -- 6.6.1 Trie Insertion -- 6.6.2 Bucket Insertion -- 6.6.3 Full Trie -- 6.6.4 Trie Size -- 6.6.5 Trie Deletion -- 6.6.6 Trie Performance -- Chapter 7: Storage Technology -- 7.1 OPTICAL DRIVES -- 7.1.1 Compact Disc -- 7.1.2 Digital Versatile Disc -- 7.1.3 Blu-ray Disc -- 7.2 SOLID STATE DRIVES -- 7.2.1 Floating Gate Transistors -- 7.2.2 Read-Write-Erase -- 7.2.3 SSD Controller -- 7.2.4 Advantages -- 7.3 HOLOGRAPHIC STORAGE -- 7.3.1 Holograms -- 7.3.2 Data Holograms -- 7.3.3 Commercialization -- 7.4 MOLECULAR MEMORY -- 7.5 MRAM -- Chapter 8: Distributed Hash Tables -- 8.1 HISTORY -- 8.2 KEYSPACE -- 8.3 KEYSPACE PARTITIONING -- 8.4 OVERLAY NETWORK -- 8.5 CHORD -- 8.5.1 Keyspace -- 8.5.2 Keyspace Partitioning -- 8.5.3 Overlay Network -- 8.5.4 Addition -- 8.5.5 Failure -- Chapter 9: Large File Systems -- 9.1 RAID -- 9.1.1 Parity -- 9.2 ZFS -- 9.2.1 Fault Tolerance -- 9.2.2 Self-Healing -- 9.2.3 Snapshots -- 9.3 GFS -- 9.3.1 Architecture -- 9.3.2 Master Metadata -- 9.3.3 Mutations -- 9.3.4 Fault Tolerance -- 9.4 HADOOP -- 9.4.1 MapReduce -- 9.4.2 MapReduce Implementation -- 9.4.3 HDFS -- 9.4.4 Pig -- 9.4.5 Hive -- 9.5 CASSANDRA -- 9.5.1 Design -- 9.5.2 Improvements -- 9.5.3 Query Language -- 9.6 PRESTO -- Chapter 10: NoSQL Storage -- 10.1 GRAPH DATABASES -- 10.1.1 Neo4j -- 10.1.2 Caching -- 10.1.3 Query Languages -- 10.2 DOCUMENT DATABASES -- 10.2.1 SQL Versus NoSQL -- 10.2.2 MongoDB -- 10.2.3 Indexing -- 10.2.4 Query Languages -- Appendix A: Order Notation -- A.1 Θ-NOTATION -- A.2 Ο-NOTATION -- A.3 Ω-NOTATION -- A.4 INSERTION SORT -- A.5 SHELL SORT -- Appendix B: Assignment 1: Search -- B.1 KEY AND SEEK LISTS -- B.2 PROGRAM EXECUTION -- B.3 IN-MEMORY SEQUENTIAL SEARCH -- B.4 IN-MEMORY BINARY SEARCH -- B.5 ON-DISK SEQUENTIAL SEARCH -- B.6 ON-DISK BINARY SEARCH -- B.7 PROGRAMMING ENVIRONMENT. B.7.1 Reading Binary Integers -- B.7.2 Measuring Time -- B.7.3 Writing Results -- B.8 SUPPLEMENTAL MATERIAL -- B.9 HAND-IN REQUIREMENTS -- Appendix C: Assignment 2: Indices -- C.1 STUDENT FILE -- C.2 PROGRAM EXECUTION -- C.3 IN-MEMORY PRIMARY KEY INDEX -- C.4 IN-MEMORY AVAILABILITY LIST -- C.4.1 First Fit -- C.4.2 Best Fit -- C.4.3 Worst Fit -- C.5 USER INTERFACE -- C.5.1 Add -- C.5.2 Find -- C.5.3 Del -- C.5.4 End -- C.6 PROGRAMMING ENVIRONMENT -- C.6.1 Writing Results -- C.7 SUPPLEMENTAL MATERIAL -- C.8 HAND-IN REQUIREMENTS -- Appendix D: Assignment 3: Mergesort -- D.1 INDEX FILE -- D.2 PROGRAM EXECUTION -- D.3 AVAILABLE MEMORY -- D.4 BASIC MERGESORT -- D.5 MULTISTEP MERGESORT -- D.6 REPLACEMENT SELECTION MERGESORT -- D.7 PROGRAMMING ENVIRONMENT -- D.7.1 Measuring Time -- D.7.2 Writing Results -- D.8 SUPPLEMENTAL MATERIAL -- D.9 HAND-IN REQUIREMENTS -- Appendix E: Assignment 4: B-Trees -- E.1 INDEX FILE -- E.2 PROGRAM EXECUTION -- E.3 B-TREE NODES -- E.3.1 Root Node Offset -- E.4 USER INTERFACE -- E.4.1 Add -- E.4.2 Find -- E.4.3 Print -- E.4.4 End -- E.5 PROGRAMMING ENVIRONMENT -- E.6 SUPPLEMENTAL MATERIAL -- E.7 HAND-IN REQUIREMENTS -- Index.

9781315302867


Big data.


Electronic books.

QA76.9.B45.H43 2017

005.7