Parallel Processing and Applied Mathematics : 10th International Conference, PPAM 2013, Warsaw, Poland, September 8-11, 2013, Revised Selected Papers, Part II.

By: Wyrzykowski, RomanContributor(s): Dongarra, Jack | Karczewski, Konrad | Wasniewski, JerzyMaterial type: TextTextSeries: eBooks on DemandPublisher: Berlin/Heidelberg : Springer Berlin Heidelberg, 2014Copyright date: ©2014Description: 1 online resource (785 pages)Content type: text Media type: computer Carrier type: online resourceISBN: 9783642551956Subject(s): Parallel processing (Electronic computers)-Congresses | Mathematics-CongressesGenre/Form: Electronic books.Additional physical formats: Print version:: Parallel Processing and Applied Mathematics : 10th International Conference, PPAM 2013, Warsaw, Poland, September 8-11, 2013, Revised Selected Papers, Part IIDDC classification: 004.35 LOC classification: QA76.58 .P373 2014Online resources: Click here to view this ebook.
Contents:
Intro -- Preface -- Organization -- Contents - Part II -- Contents - Part I -- Workshop on Scheduling for Parallel Computing (SPC 2013 -- Scheduling Bag-of-Tasks Applications to Optimize Computation Time and Cost -- 1 Introduction -- 2 Heuristics -- 3 Computational Experiments -- 4 Conclusions -- References -- Scheduling Moldable Tasks with Precedence Constraints and Arbitrary Speedup Functions on Multiprocessors -- 1 Introduction -- 2 Definitions and Notation -- 3 Related Work -- 4 Scheduling Algorithm -- 4.1 Pseudocode -- 4.2 Asymptotic Run-Time Analysis -- 5 Evaluation -- 5.1 DAGs and Platforms -- 5.2 Simulation Results -- 6 Discussion and Conclusions -- References -- OStrich: Fair Scheduling for Multiple Submissions -- 1 Introduction -- 2 State-of-the-Art -- 3 Model and Problem Definition -- 4 Algorithm -- 5 Theoretical Analysis -- 5.1 Worst-Case Bound -- 6 Simulations -- 7 Concluding Remarks -- References -- Fair Share Is Not Enough: Measuring Fairness in Scheduling with Cooperative Game Theory -- 1 Introduction -- 2 The Scheduling Model -- 3 Fairness by the Shapley Value -- 3.1 Computing the Shapley Value -- 3.2 Strategy-Resilient Utility Functions -- 4 Algorithms -- 5 Simulation Experiments -- 5.1 Settings -- 5.2 Results -- 6 Conclusions -- References -- Setting up Clusters of Computing Units to Process Several Data Streams Efficiently -- 1 Introduction and Related Works -- 2 Presentation of the AS4DR Method -- 3 Resource Selection for AS4DR in a Multiple Data Streams Context -- 3.1 Method -- 3.2 Experimental Assessment -- 4 Conclusion -- References -- The 5th Workshop on Language-Based Parallel Programming Models (WLPP 2013) -- Towards Standardization of Measuring the Usability of Parallel Languages -- 1 Introduction -- 2 The Building Blocks of Empirical Experiments -- 2.1 The Algorithms -- 2.2 The Languages -- 2.3 The Hardware Platforms.
2.4 The Parallel Debuggers -- 2.5 The Human-Subjects -- 2.6 The Metrics -- 2.7 The Experiment Duration -- 2.8 Usability vs. Scalability -- 2.9 Human vs. Machine -- 3 Related Work -- 4 Conclusions -- References -- Experiences with Implementing Task Pools in Chapel and X10 -- 1 Introduction -- 2 Background and Benchmark -- 2.1 Chapel -- 2.2 X10 -- 2.3 Task Pools -- 2.4 UTS -- 3 Language Assessment -- 3.1 Overview of Implementations -- 3.2 Object-Orientation and Parallelism -- 3.3 References, Values, and Copying -- 3.4 Worker Management and Initialization -- 3.5 Reduction -- 3.6 Diverse Language Issues -- 4 Performance -- 5 Related Work -- 6 Conclusions -- References -- Parampl: A Simple Approach for Parallel Execution of AMPL Programs -- 1 Introduction -- 2 Related Work -- 3 Design of Parampl -- 4 Evaluation and Experiments -- 5 Conclusion -- References -- Prototyping Framework for Parallel Numerical Computations -- 1 Introduction -- 2 Related Work -- 3 Tool Kaira -- 4 Libraries -- 4.1 Octave Libraries -- 5 Case Study -- 5.1 Experiments and Results -- 6 Conclusion -- References -- Algorithms for In-Place Matrix Transposition -- 1 Introduction -- 2 Notation and Terminology -- 3 Matrix Transposition Based on Cycle Following -- 4 Matrix Transposition Based on Swaps -- 4.1 The Partition Phase -- 4.2 The Transpose Phase -- 4.3 The Exchange Operation -- 4.4 The Shuffle and Unshuffle Operations -- 5 Variations and Alternatives to the TT Algorithm -- 5.1 A Divide-and-Conquer Version of the Shuffle and Unshuffle Operations -- 5.2 The Use of Constant Additional Memory -- 5.3 Exploiting Parallelism -- 5.4 Novel Algorithms for Matrix Transposition -- 6 Conclusions -- References -- FooPar: A Functional Object Oriented Parallel Framework in Scala -- 1 Introduction -- 2 Definitions, Notations, and Isoefficiency -- 3 The FooPar Framework -- 3.1 Technologies.
3.2 SPMD Operations on Distributed Sequences -- 3.3 Data Structures -- 4 Matrix-Matrix Multiplication in FooPar -- 4.1 Serial Matrix-Matrix Multiplication -- 4.2 Generic Algorithm for Parallel Matrix-Matrix Multiplication -- 4.3 Grid Abstraction in FooPar for Parallel Matrix-Matrix Multiplication -- 5 Test Results -- 6 Conclusions -- References -- Effects of Segmented Finite Difference Time Domain on GPU -- 1 Introduction -- 2 The Finite Difference Time Domain Method -- 3 Segmented FDTD and Its Implementation on GPU -- 4 Results -- 5 Conclusion -- References -- Optimization of an OpenCL-Based Multi-swarm PSO Algorithm on an APU -- 1 Introduction -- 2 Architecture and Runtime System -- 3 Algorithm, Implementation and Optimization -- 3.1 MPSO Algorithm -- 3.2 Data Layout -- 3.3 Random Number Generation -- 3.4 Particle Initialization -- 3.5 Update Fitness -- 3.6 Update Bests -- 3.7 Update Position/Velocity -- 3.8 Find Best/Worst Particles -- 3.9 Swap Particles -- 4 Results -- 5 Conclusion -- References -- Core Allocation Policies on Multicore Platforms to Accelerate Forest Fire Spread Predictions -- 1 Introduction -- 2 Hybrid MPI-OpenMP Master/Worker Prediction Scheme -- 2.1 Evaluating the Hybrid Scheme -- 3 FARSITE Characterization -- 4 Experimental Study -- 5 Conclusions and Future Work -- References -- The 4th Workshop on Performance Evaluation of Parallel Applications on Large-Scale Systems -- The Effect of Parallelization on a Tetrahedral Mesh Optimization Method -- 1 Introduction -- 2 Our Approach to Tetrahedral Mesh Optimization -- 3 Parallel Algorithm for Mesh Untangling and Smoothing -- 4 Experimental Methodology -- 5 Performance Evaluation -- 5.1 Performance Scalability -- 5.2 Load Balancing -- 5.3 Parallelism Bottlenecks -- 5.4 Influence of Graph Coloring Algorithms on Parallel Performance -- 6 Conclusions and Future Work -- References.
Analysis of Partitioning Models and Metrics in Parallel Sparse Matrix-Vector Multiplication -- 1 Introduction -- 2 Parallel SpMxV Operation and Software -- 2.1 Libraries -- 2.2 Investigated Partitioning Metrics and Methods -- 3 Experimental Investigations -- 3.1 Regression Analysis -- 3.2 Summary of Further Results -- 4 Conclusion -- References -- Achieving Memory Scalability in the GYSELA Code to Fit Exascale Constraints -- 1 Introduction -- 2 Overview of GYSELA -- 3 Memory Bottleneck -- 3.1 Analysis -- 3.2 Approach -- 4 Customised Modeling and Tracing Memory Tools -- 4.1 Trace File -- 4.2 Visualization -- 4.3 Prediction -- 5 Results -- 5.1 Memory Footprint Reduction -- 5.2 Prediction over Large Meshes -- 6 Conclusion -- References -- Probabilistic Analysis of Barrier Eliminating Method Applied to Load-Imbalanced Parallel Application -- 1 Introduction -- 2 A Probabilistic Analysis of a Barrier Eliminating Algorithm -- 2.1 The Behavioral Model of Parallel Program -- 2.2 The Definition of Dependency Matrix -- 2.3 Probability Distribution of the Execution Time -- 3 Evaluation -- 3.1 Results -- 3.2 Discussion -- 4 Related Work -- 5 Conclusion -- References -- Multi-GPU Parallel Memetic Algorithm for Capacitated Vehicle Routing Problem -- 1 Introduction -- 2 Capacitated Vehicle Routing Problem -- 3 The GPU Algorithm -- 3.1 Algorithm Analysis -- 4 Computational Experiments -- 5 Conclusions -- References -- Parallel Applications Performance Evaluation Using the Concept of Granularity -- 1 Introduction -- 2 Performance Metrics -- 3 Using Granularity for Performance Analysis -- 4 Case Studies -- 4.1 Experimental Results -- 5 Conclusions and Future Work -- References -- Workshop on Parallel Computational Biology (PBC 2013) -- Resolving Load Balancing Issues in BWA on NUMA Multicore Architectures -- 1 Introduction -- 2 BWA Implementation.
2.1 Burrows-Wheeler Alignment Algorithm -- 2.2 Measuring Load Imbalance -- 3 Removing Load Imbalance with Cilk -- 3.1 Cilk-Based Parallelisation -- 3.2 Improved Scaling Results -- 4 Other Issues -- 4.1 Memory Latency and Hyperthreading -- 4.2 Parallel Versus Sequential Section -- 5 Related Work -- 6 Conclusions -- References -- K-mulus: Strategies for BLAST in the Cloud -- 1 Introduction -- 2 Methods -- 2.1 MapReduce -- 2.2 Parallelization Strategies -- 2.3 K-mer Indexing -- 3 Results -- 3.1 Comparison of Parallelization Approaches on a Modest Size Cluster -- 3.2 Analysis of Database K-Mer Index -- 4 Discussion -- References -- Faster GPU-Accelerated Smith-Waterman Algorithm with Alignment Backtracking for Short DNA Sequences -- 1 Introduction -- 2 Methods -- 2.1 The Smith-Waterman Algorithm -- 2.2 GPU Architecture -- 2.3 Parallelization Using CUDA -- 3 Performance Evaluation -- 4 Conclusions -- References -- Accelerating String Matching on MIC Architecture for Motif Extraction -- 1 Introduction -- 2 Definitions and Notation -- 3 Algorithms -- 4 Implementation -- 4.1 MIC Implementation -- 5 Experimental Results -- References -- A Parallel, Distributed-Memory Framework for Comparative Motif Discovery -- 1 Introduction -- 2 Comparative Motif Discovery Framework -- 3 Distributed-Memory, Parallel Implementation -- 4 Results and Current Limitations -- 5 Conclusion and Future Research directions -- References -- Parallel Seed-Based Approach to Protein Structure Similarity Detection -- 1 Introduction -- 1.1 Alignment Graphs -- 1.2 Relation to Protein Structure Comparison -- 1.3 Measures for Protein Alignments -- 2 Methods -- 2.1 Our Approach -- 2.2 Overview of the Algorithm -- 2.3 Seed Enumeration -- 2.4 Seed Extension -- 2.5 Extension Filtering -- 2.6 Guarantees on Resulting Alignments' RMSD Scores -- 2.7 Result Ranking -- 3 Parallelism.
3.1 Overview of the Implemented Parallelism.
Tags from this library: No tags from this library for this title. Log in to add tags.
Item type Current location Call number URL Status Date due Barcode
Electronic Book UT Tyler Online
Online
QA76.58 .P373 2014 (Browse shelf) https://ebookcentral.proquest.com/lib/uttyler/detail.action?docID=3096803 Available EBC3096803

Intro -- Preface -- Organization -- Contents - Part II -- Contents - Part I -- Workshop on Scheduling for Parallel Computing (SPC 2013 -- Scheduling Bag-of-Tasks Applications to Optimize Computation Time and Cost -- 1 Introduction -- 2 Heuristics -- 3 Computational Experiments -- 4 Conclusions -- References -- Scheduling Moldable Tasks with Precedence Constraints and Arbitrary Speedup Functions on Multiprocessors -- 1 Introduction -- 2 Definitions and Notation -- 3 Related Work -- 4 Scheduling Algorithm -- 4.1 Pseudocode -- 4.2 Asymptotic Run-Time Analysis -- 5 Evaluation -- 5.1 DAGs and Platforms -- 5.2 Simulation Results -- 6 Discussion and Conclusions -- References -- OStrich: Fair Scheduling for Multiple Submissions -- 1 Introduction -- 2 State-of-the-Art -- 3 Model and Problem Definition -- 4 Algorithm -- 5 Theoretical Analysis -- 5.1 Worst-Case Bound -- 6 Simulations -- 7 Concluding Remarks -- References -- Fair Share Is Not Enough: Measuring Fairness in Scheduling with Cooperative Game Theory -- 1 Introduction -- 2 The Scheduling Model -- 3 Fairness by the Shapley Value -- 3.1 Computing the Shapley Value -- 3.2 Strategy-Resilient Utility Functions -- 4 Algorithms -- 5 Simulation Experiments -- 5.1 Settings -- 5.2 Results -- 6 Conclusions -- References -- Setting up Clusters of Computing Units to Process Several Data Streams Efficiently -- 1 Introduction and Related Works -- 2 Presentation of the AS4DR Method -- 3 Resource Selection for AS4DR in a Multiple Data Streams Context -- 3.1 Method -- 3.2 Experimental Assessment -- 4 Conclusion -- References -- The 5th Workshop on Language-Based Parallel Programming Models (WLPP 2013) -- Towards Standardization of Measuring the Usability of Parallel Languages -- 1 Introduction -- 2 The Building Blocks of Empirical Experiments -- 2.1 The Algorithms -- 2.2 The Languages -- 2.3 The Hardware Platforms.

2.4 The Parallel Debuggers -- 2.5 The Human-Subjects -- 2.6 The Metrics -- 2.7 The Experiment Duration -- 2.8 Usability vs. Scalability -- 2.9 Human vs. Machine -- 3 Related Work -- 4 Conclusions -- References -- Experiences with Implementing Task Pools in Chapel and X10 -- 1 Introduction -- 2 Background and Benchmark -- 2.1 Chapel -- 2.2 X10 -- 2.3 Task Pools -- 2.4 UTS -- 3 Language Assessment -- 3.1 Overview of Implementations -- 3.2 Object-Orientation and Parallelism -- 3.3 References, Values, and Copying -- 3.4 Worker Management and Initialization -- 3.5 Reduction -- 3.6 Diverse Language Issues -- 4 Performance -- 5 Related Work -- 6 Conclusions -- References -- Parampl: A Simple Approach for Parallel Execution of AMPL Programs -- 1 Introduction -- 2 Related Work -- 3 Design of Parampl -- 4 Evaluation and Experiments -- 5 Conclusion -- References -- Prototyping Framework for Parallel Numerical Computations -- 1 Introduction -- 2 Related Work -- 3 Tool Kaira -- 4 Libraries -- 4.1 Octave Libraries -- 5 Case Study -- 5.1 Experiments and Results -- 6 Conclusion -- References -- Algorithms for In-Place Matrix Transposition -- 1 Introduction -- 2 Notation and Terminology -- 3 Matrix Transposition Based on Cycle Following -- 4 Matrix Transposition Based on Swaps -- 4.1 The Partition Phase -- 4.2 The Transpose Phase -- 4.3 The Exchange Operation -- 4.4 The Shuffle and Unshuffle Operations -- 5 Variations and Alternatives to the TT Algorithm -- 5.1 A Divide-and-Conquer Version of the Shuffle and Unshuffle Operations -- 5.2 The Use of Constant Additional Memory -- 5.3 Exploiting Parallelism -- 5.4 Novel Algorithms for Matrix Transposition -- 6 Conclusions -- References -- FooPar: A Functional Object Oriented Parallel Framework in Scala -- 1 Introduction -- 2 Definitions, Notations, and Isoefficiency -- 3 The FooPar Framework -- 3.1 Technologies.

3.2 SPMD Operations on Distributed Sequences -- 3.3 Data Structures -- 4 Matrix-Matrix Multiplication in FooPar -- 4.1 Serial Matrix-Matrix Multiplication -- 4.2 Generic Algorithm for Parallel Matrix-Matrix Multiplication -- 4.3 Grid Abstraction in FooPar for Parallel Matrix-Matrix Multiplication -- 5 Test Results -- 6 Conclusions -- References -- Effects of Segmented Finite Difference Time Domain on GPU -- 1 Introduction -- 2 The Finite Difference Time Domain Method -- 3 Segmented FDTD and Its Implementation on GPU -- 4 Results -- 5 Conclusion -- References -- Optimization of an OpenCL-Based Multi-swarm PSO Algorithm on an APU -- 1 Introduction -- 2 Architecture and Runtime System -- 3 Algorithm, Implementation and Optimization -- 3.1 MPSO Algorithm -- 3.2 Data Layout -- 3.3 Random Number Generation -- 3.4 Particle Initialization -- 3.5 Update Fitness -- 3.6 Update Bests -- 3.7 Update Position/Velocity -- 3.8 Find Best/Worst Particles -- 3.9 Swap Particles -- 4 Results -- 5 Conclusion -- References -- Core Allocation Policies on Multicore Platforms to Accelerate Forest Fire Spread Predictions -- 1 Introduction -- 2 Hybrid MPI-OpenMP Master/Worker Prediction Scheme -- 2.1 Evaluating the Hybrid Scheme -- 3 FARSITE Characterization -- 4 Experimental Study -- 5 Conclusions and Future Work -- References -- The 4th Workshop on Performance Evaluation of Parallel Applications on Large-Scale Systems -- The Effect of Parallelization on a Tetrahedral Mesh Optimization Method -- 1 Introduction -- 2 Our Approach to Tetrahedral Mesh Optimization -- 3 Parallel Algorithm for Mesh Untangling and Smoothing -- 4 Experimental Methodology -- 5 Performance Evaluation -- 5.1 Performance Scalability -- 5.2 Load Balancing -- 5.3 Parallelism Bottlenecks -- 5.4 Influence of Graph Coloring Algorithms on Parallel Performance -- 6 Conclusions and Future Work -- References.

Analysis of Partitioning Models and Metrics in Parallel Sparse Matrix-Vector Multiplication -- 1 Introduction -- 2 Parallel SpMxV Operation and Software -- 2.1 Libraries -- 2.2 Investigated Partitioning Metrics and Methods -- 3 Experimental Investigations -- 3.1 Regression Analysis -- 3.2 Summary of Further Results -- 4 Conclusion -- References -- Achieving Memory Scalability in the GYSELA Code to Fit Exascale Constraints -- 1 Introduction -- 2 Overview of GYSELA -- 3 Memory Bottleneck -- 3.1 Analysis -- 3.2 Approach -- 4 Customised Modeling and Tracing Memory Tools -- 4.1 Trace File -- 4.2 Visualization -- 4.3 Prediction -- 5 Results -- 5.1 Memory Footprint Reduction -- 5.2 Prediction over Large Meshes -- 6 Conclusion -- References -- Probabilistic Analysis of Barrier Eliminating Method Applied to Load-Imbalanced Parallel Application -- 1 Introduction -- 2 A Probabilistic Analysis of a Barrier Eliminating Algorithm -- 2.1 The Behavioral Model of Parallel Program -- 2.2 The Definition of Dependency Matrix -- 2.3 Probability Distribution of the Execution Time -- 3 Evaluation -- 3.1 Results -- 3.2 Discussion -- 4 Related Work -- 5 Conclusion -- References -- Multi-GPU Parallel Memetic Algorithm for Capacitated Vehicle Routing Problem -- 1 Introduction -- 2 Capacitated Vehicle Routing Problem -- 3 The GPU Algorithm -- 3.1 Algorithm Analysis -- 4 Computational Experiments -- 5 Conclusions -- References -- Parallel Applications Performance Evaluation Using the Concept of Granularity -- 1 Introduction -- 2 Performance Metrics -- 3 Using Granularity for Performance Analysis -- 4 Case Studies -- 4.1 Experimental Results -- 5 Conclusions and Future Work -- References -- Workshop on Parallel Computational Biology (PBC 2013) -- Resolving Load Balancing Issues in BWA on NUMA Multicore Architectures -- 1 Introduction -- 2 BWA Implementation.

2.1 Burrows-Wheeler Alignment Algorithm -- 2.2 Measuring Load Imbalance -- 3 Removing Load Imbalance with Cilk -- 3.1 Cilk-Based Parallelisation -- 3.2 Improved Scaling Results -- 4 Other Issues -- 4.1 Memory Latency and Hyperthreading -- 4.2 Parallel Versus Sequential Section -- 5 Related Work -- 6 Conclusions -- References -- K-mulus: Strategies for BLAST in the Cloud -- 1 Introduction -- 2 Methods -- 2.1 MapReduce -- 2.2 Parallelization Strategies -- 2.3 K-mer Indexing -- 3 Results -- 3.1 Comparison of Parallelization Approaches on a Modest Size Cluster -- 3.2 Analysis of Database K-Mer Index -- 4 Discussion -- References -- Faster GPU-Accelerated Smith-Waterman Algorithm with Alignment Backtracking for Short DNA Sequences -- 1 Introduction -- 2 Methods -- 2.1 The Smith-Waterman Algorithm -- 2.2 GPU Architecture -- 2.3 Parallelization Using CUDA -- 3 Performance Evaluation -- 4 Conclusions -- References -- Accelerating String Matching on MIC Architecture for Motif Extraction -- 1 Introduction -- 2 Definitions and Notation -- 3 Algorithms -- 4 Implementation -- 4.1 MIC Implementation -- 5 Experimental Results -- References -- A Parallel, Distributed-Memory Framework for Comparative Motif Discovery -- 1 Introduction -- 2 Comparative Motif Discovery Framework -- 3 Distributed-Memory, Parallel Implementation -- 4 Results and Current Limitations -- 5 Conclusion and Future Research directions -- References -- Parallel Seed-Based Approach to Protein Structure Similarity Detection -- 1 Introduction -- 1.1 Alignment Graphs -- 1.2 Relation to Protein Structure Comparison -- 1.3 Measures for Protein Alignments -- 2 Methods -- 2.1 Our Approach -- 2.2 Overview of the Algorithm -- 2.3 Seed Enumeration -- 2.4 Seed Extension -- 2.5 Extension Filtering -- 2.6 Guarantees on Resulting Alignments' RMSD Scores -- 2.7 Result Ranking -- 3 Parallelism.

3.1 Overview of the Implemented Parallelism.

Description based on publisher supplied metadata and other sources.

There are no comments on this title.

to post a comment.