Chen, Quan.

Task Scheduling for Multi-Core and Parallel Architectures : Challenges, Solutions and Perspectives. - 1 online resource (251 pages) - eBooks on Demand .

Intro -- Preface -- Part I: Background -- Part II: Task Scheduling for Various Parallel Architectures -- Part III: Summary and Perspectives -- Acknowledgements -- Contents -- Acronyms -- Part I Background -- 1 Emerging Parallel Architectures -- 1.1 Parallel Architecture is Dominating the World -- 1.2 Shared Memory Parallel Architecture -- 1.2.1 Multi-core Architecture -- 1.2.2 Multi-socket Multi-core Architecture -- 1.2.3 Asymmetric Multi-core Architecture -- 1.3 Distributed Memory Parallel Architecture -- 1.3.1 Tight-Coupled Distributed Memory Architecture -- 1.3.2 Loose-Coupled Distributed Memory Architecture -- 1.4 Accelerator -- 1.4.1 GPGPU -- 1.4.2 Intel Xeon Phi -- 1.5 Heterogeneous Parallel Architecture -- 1.6 Chapter Highlights -- References -- 2 Conventional Task Scheduling Policies -- 2.1 Manual Task Scheduling Policies -- 2.1.1 Message Passing -- 2.1.2 Multi-threading -- 2.2 Automatic Task Scheduling Policies -- 2.2.1 Task Scheduling Policies for Data Parallelism -- 2.2.2 Task Scheduling Policies for Task Parallelism -- 2.3 Parallel Programming Environments -- 2.3.1 Programming Environments for Data Parallelism -- 2.3.2 Programming Environments for Task Parallelism -- 2.4 Problems in Existing Task Scheduling Systems -- 2.5 Chapter Highlights -- References -- Part II Optimized Task Scheduling for Parallel Architectures -- 3 Work-Stealing for Multi-socket Architecture -- 3.1 Background and Existing Problems -- 3.1.1 The TRICI Problem -- 3.2 Prior Solutions -- 3.2.1 Scalable Locality-Aware Adaptive Work-Stealing (SLAW) -- 3.2.2 Multi-Threaded Shepherds (MTS) -- 3.2.3 Probability Work-Stealing (PWS) -- 3.2.4 Hierarchical Work-Stealing (HWS) -- 3.2.5 CONTROLLED-PDF -- 3.3 Cache-Aware Bi-tier Work-Stealing -- 3.3.1 Solution Overview -- 3.3.2 Design Overview -- 3.4 Cache-Aware Task Graph Partition Policy. 3.4.1 Full Tree Oriented Partition Policy -- 3.4.2 General Tree Oriented Partition Policy -- 3.5 Bi-tier Work-Stealing Scheduling Policy -- 3.5.1 Work Stealing Algorithm -- 3.5.2 Task Generating Algorithm -- 3.6 Theoretical Time and Space Bounds -- 3.6.1 Theoretical Bounds for Random Work-Stealing -- 3.6.2 Theoretical Bounds for CAB -- 3.7 Implementation Methodology -- 3.7.1 Compiler Support -- 3.7.2 Runtime Support -- 3.8 Evaluation of CAB -- 3.8.1 Performance of CAB-FTO -- 3.8.2 Performance of CAB-GTO -- 3.9 Summary -- 3.9.1 Chapter Highlights -- References -- 4 Work-Stealing for NUMA-enabled Architecture -- 4.1 Chapter Organization -- 4.2 Background and Existing Problems -- 4.3 Prior Solutions -- 4.3.1 Random Pushing -- 4.3.2 Cluster-Aware Hierarchical Stealing (CHS) -- 4.3.3 Cluster-Aware Load-Based Stealing (CLS) -- 4.3.4 Cluster-Aware Random Stealing (CRS) -- 4.3.5 TATL -- 4.3.6 NUMALB -- 4.3.7 Offline Technique for Unstructured Parallelism -- 4.4 Design of Locality-Aware Work-Stealing -- 4.5 Load-Balanced Task Allocator -- 4.6 Cache-Friendly Task Graph Partitioner -- 4.6.1 Decide the Initial Partitioning -- 4.6.2 Search for the Optimal Partitioning -- 4.7 Triple-Level Work-Stealing Policy -- 4.8 Theoretical Validation -- 4.9 Implementation Methodology -- 4.10 Performance Evaluation of LAWS -- 4.10.1 Experimental Platforms -- 4.10.2 Performance of LAWS -- 4.10.3 Effectiveness of Cache-Friendly Task Graph Partitioner -- 4.10.4 Scalability of LAWS -- 4.10.5 Overhead of LAWS -- 4.10.6 Applicability of LAWS -- 4.11 Summary -- 4.11.1 Chapter Highlights -- References -- 5 Dynamic Load Balancing for Asymmetric Multi-core Architecture -- 5.1 Chapter Organization -- 5.2 Problem Formulation -- 5.3 Existing Solutions -- 5.3.1 Task Snatching Technique -- 5.3.2 CAMP -- 5.3.3 Bias Scheduling -- 5.3.4 Age-Based Scheduling -- 5.3.5 Speed-Based Balancing. 5.3.6 Scheduling on AMC with Hardware Support -- 5.4 Theoretical Ideal Task Scheduling -- 5.5 A Practical Polynomial Time Solution -- 5.6 Design of Asymmetric-Aware Task Scheduling -- 5.6.1 Processing Flow of AATS -- 5.7 History-Based Task Allocation -- 5.7.1 Build Task Classes -- 5.7.2 Allocate Task Classes to C-Groups -- 5.8 Preference-Based Work-Stealing -- 5.8.1 Scheduling Within a C-Group -- 5.8.2 Scheduling Among C-Groups -- 5.9 Implementation Methodology of AATS -- 5.10 Performance of AATS -- 5.10.1 Experimental Configurations -- 5.10.2 Performance on Emulated Platform -- 5.10.3 Effectiveness of the Preference-Based Work-Stealing -- 5.10.4 Scalability of AATS -- 5.10.5 Integrating Task-Snatching in AATS -- 5.11 Summary -- 5.11.1 Chapter Highlights -- References -- 6 Load Balancing for Heterogeneous Parallel Architecture -- 6.1 Background and Existing Problems -- 6.2 Prior Solutions -- 6.2.1 Static Scheduling -- 6.2.2 Quick Scheduling -- 6.2.3 Split Scheduling -- 6.2.4 FinePar -- 6.3 Heterogeneous-Aware Task Scheduling -- 6.4 Comparison of the Scheduling Policies -- 6.5 Performance of Dynamic Scheduling Policies -- 6.5.1 Experimental Setup -- 6.5.2 Performance -- 6.5.3 Effectiveness of Balancing Workload -- 6.5.4 Effectiveness of Predicting the Performance of GPU -- 6.5.5 Impact of Profiling Granularity -- 6.6 Summary -- 6.6.1 Chapter Highlights -- References -- 7 MapReduce for Cloud Computing -- 7.1 Introduction to MapReduce -- 7.1.1 Scheduling Policy in MapReduce -- 7.1.2 Adapting to Other Platforms -- 7.1.3 Variations of MapReduce -- 7.1.4 Existing Problem in Heterogeneous Environment -- 7.2 Prior Solutions -- 7.2.1 Least Progress Policy -- 7.2.2 Longest Approximate Time to End Policy -- 7.2.3 Calculating Progress Score -- 7.2.4 Problems in Existing Solutions -- 7.2.5 Tarazu -- 7.3 Self-adaptive MapReduce Scheduling -- 7.3.1 Overview of SAMR. 7.3.2 Tuning Phase Weights -- 7.3.3 Calculating Progress Score -- 7.3.4 Identifying Straggler Task -- 7.3.5 Identifying Slow Node -- 7.3.6 Boosting Straggler Task -- 7.4 Implementation of SAMR -- 7.5 Performance Evaluation -- 7.5.1 Experimental Setup -- 7.5.2 Performance -- 7.5.3 Effectiveness of Speculative Execution and Weight Tuning -- 7.5.4 Parameter Selection in SAMR -- 7.6 Summary -- 7.6.1 Chapter Highlights -- References -- 8 QoS-Aware Task Reordering for Accelerators -- 8.1 Background and Existing Problems -- 8.2 Prior Work on Handling Accelerator Co-location -- 8.2.1 TimeGraph -- 8.2.2 GPU-EvR -- 8.2.3 Simultaneous Multi-kernel (SMK) -- 8.2.4 GPU Thread Preemption -- 8.3 Real System Investigation on Accelerator Co-location -- 8.4 Investigation on Priority-Based Scheduling Policy -- 8.5 Design of Task Scheduling Mechanism on Accelerators -- 8.6 Case Study: QoS-Aware Task Scheduling on Accelerator -- 8.6.1 Root Causes of Long Tail Latency at Co-location -- 8.6.2 Design of Baymax -- 8.7 Task Duration Modeling in Baymax -- 8.7.1 Task Duration Predictor -- 8.7.2 Selecting Representative Features -- 8.7.3 Low Overhead Prediction Models -- 8.7.4 Minimizing Prediction Error -- 8.7.5 Prediction Accuracy -- 8.8 Scheduling Hand-Written Kernels and Library Calls -- 8.8.1 Breaking down the End-to-end Latency -- 8.8.2 Scheduling Policy -- 8.9 Scheduling Data Transfer Tasks -- 8.9.1 Characterizing PCI-e Bandwidth Contention -- 8.9.2 Scheduling Policy -- 8.10 Performance of Baymax -- 8.10.1 Experimental Configuration -- 8.10.2 QoS and Throughput -- 8.10.3 Scheduling Data Transfer Tasks -- 8.10.4 Beyond Pair-Wise Co-locations -- 8.11 Summary -- 8.11.1 Chapter Highlights -- References -- Part III Summary and Discussion -- 9 Summary and Discussion -- 9.1 Guideline of Scheduling Technique Design -- 9.2 Multi-socket Architecture. 9.3 NUMA-Enabled Multi-socket Architecture -- 9.4 Asymmetric Multi-core Architecture -- 9.5 Heterogeneous CPU+GPU Architecture -- 9.6 Heterogeneous Cloud Platform -- 9.7 Non-preemptive Accelerator Architecture -- Glossary.

9789811062384


Computer scheduling..
Parallel processing (Electronic computers).


Electronic books.

QA75.5-76.95

004