JuliaGPU on Feedspot

CUDA.jl 5.1: Unified memory and cooperative groups

JuliaGPU

by Tim Besard

6M ago

CUDA.jl 5.1 greatly improves the support of two important parts of the CUDA toolkit: unified memory, for accessing GPU memory on the CPU and vice-versa, and cooperative groups which offer a more modular approach to kernel programming. This blog post is located at https://info.juliahub.com/cuda-jl-5-1-unified-memory ..read more

Visit website

CUDA.jl 5.0: Integrated profiler and task synchronization changes

JuliaGPU

by Tim Besard

7M ago

CUDA.jl 5.0 is an major release that adds an integrated profiler to CUDA.jl, and reworks how tasks are synchronized. The release is slightly breaking, as it changes how local toolkits are handled and raises the minimum Julia and CUDA versions. This blog post is located at https://info.juliahub.com/cuda-jl-5-0-changes ..read more

Visit website

Profiling oneAPI.jl applications with VTune

JuliaGPU

by Tim Besard

9M ago

Profiling GPU applications is hard, so this post shows how to use Intel's VTune Profiler to profile GPU applications written in Julia with oneAPI.jl. Because of the asynchronous nature of GPU execution, profiling GPU applications with Julia's tried and tested tools like @profile or even @time can be misleading: They will only show the time spent on the CPU, and will likely report that your application is spending most of its time waiting for the GPU. To get a better understanding of what is happening on the GPU, we need specialized tools. In this post, we'll show how to use Intel's VTune Profi ..read more

Visit website

Metal.jl 0.2: Metal Performance Shaders

JuliaGPU

by Tim Besard

1y ago

Metal.jl 0.2 marks a significant milestone in the development of the Metal.jl package. The release comes with initial support for the Metal Perform Shaders (MPS) framework for accelerating common operations like matrix multiplications, as well as various improvements for writing Metal kernels in Julia. Metal Performance Shaders Quoting the Apple documentation, The Metal Performance Shaders (MPS) framework contains a collection of highly optimized compute and graphics shaders for use in Metal applications. With Metal.jl 0.2, we have added initial support for this framework, and used it to accel ..read more

Visit website

OneAPI.jl 1.0: oneMKL, Intel Arc and Julia 1.9

JuliaGPU

by Tim Besard

1y ago

The release of oneAPI.jl 1.0 adds integration with the oneAPI Math Kernel Library (oneMKL) to accelerate linear algebra operations on Intel GPUs. It also brings support for Julia 1.9 and Intel Arc GPUs. oneMKL integration oneAPI.jl now uses the Intel oneAPI Math Kernel Library (oneMKL), automatically downloaded as part of oneAPI_Support_jll.jl, to accelerate a great number of BLAS and LAPACK operations on Intel GPUs. Similar to how it is implemented in our other GPU back-ends, these wrappers are available at different levels of abstraction. At the lowest level, we use a C library that wraps th ..read more

Visit website

Technical preview: Programming Apple M1 GPUs in Julia with Metal.jl

JuliaGPU

by Tim Besard

1y ago

Julia has gained a new GPU back-end: Metal.jl, for working with Apple's M1 GPUs. The back-end is built on the same foundations that make up existing GPU packages like CUDA.jl and AMDGPU.jl, so it should be familiar to anybody who's already programmed GPUs in Julia. In the following post I'll demonstrate some of that functionality and explain how it works. But first, note that Metal.jl is under heavy development: The package is considered experimental for now, as we're still working on squashing bugs and adding essential functionality. We also haven't optimized for performance yet. If you're in ..read more

Visit website

CUDA.jl 3.5-3.8

JuliaGPU

by Tim Besard

1y ago

CUDA.jl versions 3.5 to 3.8 have brought several new features to improve performance and productivity. This blog post will highlight a couple: direct copies between devices, better performance by preserving array index types and changing the memory pool, and a much-improved interface to the compute sanitizer utility. Copies between devices Typically, when sending data between devices you need to stage through the CPU. CUDA.jl now does this automatically, making it possible to directly copy between CuArrays on different devices: julia> device!(0);julia> a = CUDA.rand(2,2) 2×2 CuArray{Flo ..read more

Visit website

OneAPI.jl status update

JuliaGPU

by Tim Besard

1y ago

It has been over a year since the last update on oneAPI.jl, the Julia package for programming Intel GPUs (and other accelerators) using the oneAPI toolkit. Since then, the package has been under steady development, and several new features have been added to improve the developer experience and usability of the package. @atomic intrinsics oneAPI.jl now supports atomic operations, which are required to implement a variety of parallel algorithms. Low-level atomic functions (atomic_add!, atomic_xchg!, etc) are available as unexported methods in the oneAPI module: a = oneArray(Int32[0])function k ..read more

Visit website

CUDA.jl 3.3

JuliaGPU

by Tim Besard

1y ago

There have been several releases of CUDA.jl in the past couple of months, with many bugfixes and many exciting new features to improve GPU programming in Julia: CuArray now supports isbits Unions, CUDA.jl can emit debug info for use with NVIDIA tools, and changes to the compiler make it even easier to use the latest version of the CUDA toolkit. CuArray support for isbits Unions Unions are a way to represent values of one type or another, e.g., a value that can be an integer or a floating point. If all possible element types of a Union are so-called bitstypes, which can be stored contiguously i ..read more

Visit website

CUDA.jl 3.4

JuliaGPU

by Tim Besard

1y ago

The latest version of CUDA.jl brings several new features, from improved atomic operations to initial support for arrays with unified memory. The native random number generator introduced in CUDA.jl 3.0 is now the default fallback, and support for memory pools other than the CUDA stream-ordered one has been removed. Streamlined atomic operations In preparation of integrating with the new standard @atomic macro introduced in Julia 1.7, we have streamlined the capabilities of atomic operations in CUDA.jl. The API is now split into two levels: low-level atomic_ methods for atomic functionality th ..read more

Visit website

Follow JuliaGPU on FeedSpot