Professional CUDA C Programming.

By: Cheng, JohnContributor(s): Grossman, Max | McKercher, TyMaterial type: TextTextSeries: eBooks on DemandPublisher: Somerset : Wiley, 2014Description: 1 online resource (527 p.)ISBN: 9781118739273Subject(s): Computer architecture | Multiprocessors | Parallel processing (Electronic computers)Genre/Form: Electronic books.Additional physical formats: Print version:: Professional CUDA C ProgrammingDDC classification: 004.22 LOC classification: QA76.9.A73 -- .C446 2014ebOnline resources: Click here to view this ebook.
Contents:
Cover -- Title Page -- Copyright -- Contents -- Chapter 1 Heterogeneous Parallel Computing with CUDA -- Parallel Computing -- Sequential and Parallel Programming -- Parallelism -- Computer Architecture -- Heterogeneous Computing -- Heterogeneous Architecture -- Paradigm of Heterogeneous Computing -- CUDA: A Platform for Heterogeneous Computing -- Hello World from GPU -- Is CUDA C Programming Difficult? -- Summary -- Chapter 2 CUDA Programming Model -- Introducing the CUDA Programming Model -- CUDA Programming Structure -- Managing Memory -- Organizing Threads
Launching a CUDA Kernel -- Writing Your Kernel -- Verifying Your Kernel -- Handling Errors -- Compiling and Executing -- Timing Your Kernel -- Timing with CPU Timer -- Timing with nvprof -- Organizing Parallel Threads -- Indexing Matrices with Blocks and Threads -- Summing Matrices with a 2D Grid and 2D Blocks -- Summing Matrices with a 1D Grid and 1D Blocks -- Summing Matrices with a 2D Grid and 1D Blocks -- Managing Devices -- Using the Runtime API to Query GPU Information -- Determining the Best GPU -- Using nvidia-smi to Query GPU Information
Setting Devices at Runtime -- Summary -- Chapter 3 CUDA Execution Model -- Introducing the CUDA Execution Model -- GPU Architecture Overview -- The Fermi Architecture -- The Kepler Architecture -- Profile-Driven Optimization -- Understanding the Nature of Warp Execution -- Warps and Thread Blocks -- Warp Divergence -- Resource Partitioning -- Latency Hiding -- Occupancy -- Synchronization -- Scalability -- Exposing Parallelism -- Checking Active Warps with nvprof -- Checking Memory Operations with nvprof -- Exposing More Parallelism -- Avoiding Branch Divergence
The Parallel Reduction Problem -- Divergence in Parallel Reduction -- Improving Divergence in Parallel Reduction -- Reducing with Interleaved Pairs -- Unrolling Loops -- Reducing with Unrolling -- Reducing with Unrolled Warps -- Reducing with Complete Unrolling -- Reducing with Template Functions -- Dynamic Parallelism -- Nested Execution -- Nested Hello World on the GPU -- Nested Reduction -- Summary -- Chapter 4 Global Memory -- Introducing the CUDA Memory Model -- Benefits of a Memory Hierarchy -- CUDA Memory Model -- Memory Management
Memory Allocation and Deallocation -- Memory Transfer -- Pinned Memory -- Zero-Copy Memory -- Unified Virtual Addressing -- Unified Memory -- Memory Access Patterns -- Aligned and Coalesced Access -- Global Memory Reads -- Global Memory Writes -- Array of Structures versus Structure of Arrays -- Performance Tuning -- What Bandwidth Can a Kernel Achieve? -- Memory Bandwidth -- Matrix Transpose Problem -- Matrix Addition with Unified Memory -- Summary -- Chapter 5 Shared Memory and Constant Memory -- Introducing CUDA Shared Memory -- Shared Memory
Shared Memory Allocation
Summary: Break into the powerful world of parallel GPU programming with this down-to-earth, practical guide Designed for professionals across multiple industrial sectors, Professional CUDA C Programming  presents CUDA -- a parallel computing platform and programming model designed to ease the development of GPU programming -- fundamentals in an easy-to-follow format, and teaches readers how to think in parallel and implement parallel algorithms on GPUs. Each chapter covers a specific topic, and includes workable examples that demonstrate the development process, allowing readers
Tags from this library: No tags from this library for this title. Log in to add tags.
Item type Current location Call number URL Status Date due Barcode
Electronic Book UT Tyler Online
Online
QA76.9.A73 -- .C446 2014eb (Browse shelf) http://uttyler.eblib.com/patron/FullRecord.aspx?p=1776323 Available EBL1776323

Cover -- Title Page -- Copyright -- Contents -- Chapter 1 Heterogeneous Parallel Computing with CUDA -- Parallel Computing -- Sequential and Parallel Programming -- Parallelism -- Computer Architecture -- Heterogeneous Computing -- Heterogeneous Architecture -- Paradigm of Heterogeneous Computing -- CUDA: A Platform for Heterogeneous Computing -- Hello World from GPU -- Is CUDA C Programming Difficult? -- Summary -- Chapter 2 CUDA Programming Model -- Introducing the CUDA Programming Model -- CUDA Programming Structure -- Managing Memory -- Organizing Threads

Launching a CUDA Kernel -- Writing Your Kernel -- Verifying Your Kernel -- Handling Errors -- Compiling and Executing -- Timing Your Kernel -- Timing with CPU Timer -- Timing with nvprof -- Organizing Parallel Threads -- Indexing Matrices with Blocks and Threads -- Summing Matrices with a 2D Grid and 2D Blocks -- Summing Matrices with a 1D Grid and 1D Blocks -- Summing Matrices with a 2D Grid and 1D Blocks -- Managing Devices -- Using the Runtime API to Query GPU Information -- Determining the Best GPU -- Using nvidia-smi to Query GPU Information

Setting Devices at Runtime -- Summary -- Chapter 3 CUDA Execution Model -- Introducing the CUDA Execution Model -- GPU Architecture Overview -- The Fermi Architecture -- The Kepler Architecture -- Profile-Driven Optimization -- Understanding the Nature of Warp Execution -- Warps and Thread Blocks -- Warp Divergence -- Resource Partitioning -- Latency Hiding -- Occupancy -- Synchronization -- Scalability -- Exposing Parallelism -- Checking Active Warps with nvprof -- Checking Memory Operations with nvprof -- Exposing More Parallelism -- Avoiding Branch Divergence

The Parallel Reduction Problem -- Divergence in Parallel Reduction -- Improving Divergence in Parallel Reduction -- Reducing with Interleaved Pairs -- Unrolling Loops -- Reducing with Unrolling -- Reducing with Unrolled Warps -- Reducing with Complete Unrolling -- Reducing with Template Functions -- Dynamic Parallelism -- Nested Execution -- Nested Hello World on the GPU -- Nested Reduction -- Summary -- Chapter 4 Global Memory -- Introducing the CUDA Memory Model -- Benefits of a Memory Hierarchy -- CUDA Memory Model -- Memory Management

Memory Allocation and Deallocation -- Memory Transfer -- Pinned Memory -- Zero-Copy Memory -- Unified Virtual Addressing -- Unified Memory -- Memory Access Patterns -- Aligned and Coalesced Access -- Global Memory Reads -- Global Memory Writes -- Array of Structures versus Structure of Arrays -- Performance Tuning -- What Bandwidth Can a Kernel Achieve? -- Memory Bandwidth -- Matrix Transpose Problem -- Matrix Addition with Unified Memory -- Summary -- Chapter 5 Shared Memory and Constant Memory -- Introducing CUDA Shared Memory -- Shared Memory

Shared Memory Allocation

Break into the powerful world of parallel GPU programming with this down-to-earth, practical guide Designed for professionals across multiple industrial sectors, Professional CUDA C Programming  presents CUDA -- a parallel computing platform and programming model designed to ease the development of GPU programming -- fundamentals in an easy-to-follow format, and teaches readers how to think in parallel and implement parallel algorithms on GPUs. Each chapter covers a specific topic, and includes workable examples that demonstrate the development process, allowing readers

Description based upon print version of record.

Author notes provided by Syndetics

John Cheng, P H D, is a Research Scientist at BGP International in Houston. He has developed seismic imaging products with GPU technology and many high-performance parallel production applications on heterogeneous computing-platforms.

Max Grossman is an expert in GPU computing with experience applying CUDA to problems in medical imaging, machine learning, geophysics, and more.

Ty McKercher has been helping customers adopt GPU acceleration technologies while he has been employed at NVIDIA since 2008.

There are no comments on this title.

to post a comment.