Shader
So far, we have mostly been working on common Vulkan boilerplate that could easily be reused in another application, with minor changes and extra configuration options. Now it’s time to actually start writing some Gray-Scott reaction simulation code.
In this chapter, we will write the part of the simulation code that actually runs on the GPU. Vulkan calls it a shader, but if you feel more comfortable calling it a kernel, I won’t judge you. I’ll let the Linux contributors in the room do it for me. 😉
Introducing GLSL
One benefit of Khronos’ decision to switch to the SPIR-V intermediate representation was that it became theoretically possible to use any compiled programming language you like to write the GPU-side code. All you need is a new compiler backend that emits SPIR-V instead of machine code.
Sadly, compiler backends are big pieces of work, so Rust is not quite there yet.
All we have today is a promising prototype called
rust-gpu
. I hope to introduce it
in a future edition of this course1, but it felt a bit too experimental for
this year’s edition.
Therefore, we will prefer the tried and true alternative of Khronos’ standard GPU programming language: the OpenGL Shading Language, or GLSL for short.
GLSL is, along with HLSL, one of the two most important programming languages in the history of GPU computing. It was originally designed as a companion to the OpenGL GPU API, the predecessor to Vulkan in Khronos’ family of GPU APIs, and therefore its specification can be found right next to the OpenGL specification on the Khronos Group website.
GLSL is a derivative of the C programming language, which dropped some of C’s most controversial features2 and introduced many extensions to C to meet the needs of GPU computing.
To give some examples, in GLSL you can easily…
- Specify how the GPU code will interface with Vulkan on the CPU side.
- Easily manipulate vectors and matrices, but only up to dimension 4x4.3
- Exploit GPU texturing units to more efficiently manipulate 2/3D arrays, interpolate 1/2/3D data, and perform on-the-fly conversions between many image formats.
- Selectively let the compiler optimize operations in a numerically unstable
fashion (similar to GCC’s
-ffast-math
, except SPIR-V makes it fine-grained and calls itRelaxedPrecision
). - Control the fashion in which the GPU program execution’s will be decomposed into work-groups (the Vulkan equivalent of NVidia CUDA’s thread blocks).
- Allow the value of a compilation constant to be specified before the shader’s SPIR-V is compiled into a device binary, to improve configurability and compiler optimizations.
- Enjoy a more extensive library of built-in math functions than C could ever dream of.
- …and many other things revolving around the high-speed drawing of textured triangles that we will not care much about in everyday numerical computing.4
As you can see, this opens quite a few interesting possibilities. And that’s
part of why rust-gpu
is still experimental in spite of having been in
development for quite a while. Making a good new GPU programming language is
actually not only a matter of adding a SPIR-V backend to the compiler (which is
already a large amount of work), it’s also a matter of extending the source
programming language and its standard library to expose all the GPU features
that GPU-specific programming languages have been exposing forever.5
Our first GLSL Gray-Scott
Let us now go through a GLSL implementation of the Gray-Scott reaction simulation. This first implementation isn’t heavily tuned for performance, however it does try to leverage GLSL-specific features where the impact on readability is small and it should help runtime performance.
First of all, we specify which version of the GLSL specification the program is written against. Here, we are using the latest and greatest GLSL 4.60 specification.
#version 460
Then we start to specify the interface between the CPU and the GPU code. This is a danger zone. Anytime we change this part of the GPU code, we must remember to keep matching code on the CPU side in sync, or else the GPU program will produce completely wrong results if it runs at all.
// Initial concentrations of species U and V
layout(set = 0, binding = 0) uniform sampler2D us;
layout(set = 0, binding = 1) uniform sampler2D vs;
// Final concentrations of species U and V
layout(set = 0, binding = 2) uniform restrict writeonly image2D nextUs;
layout(set = 0, binding = 3) uniform restrict writeonly image2D nextVs;
// Computation parameters uniform
layout(set = 1, binding = 0) uniform Parameters {
mat3 weights;
float diffusion_rate_u, diffusion_rate_v,
feed_rate, kill_rate,
time_step;
} params;
// Workgroup dimensions
layout(local_size_x = 8, local_size_y = 8) in;
There is a fair bit to unpack here:
us
andvs
are 2D samplers representing our input data. There are two interfaces to GPU texturing units in Khronos APIs, images and samplers. Images let us leverage the optimized data layout of GPU textures for 2/3D spatial locality. Samplers build on top of images to provide us with optional interpolation features (not used here) and well-defined handling of out-of-bound accesses (used here to enforce the simulation’s zero boundary condition).nextUs
andnextVs
are 2D images representing our output data. Here we do not need samplers because we will not be performing out-of-bounds accesses on the output side. We can also help the GPU compiler optimize code better that telling it that these images do not alias with any other input/output data in memory and will not be read from.params
is a struct that lets us pass simulation parameters to the GPU code. Notice that unlike the CPU code, the GPU code does not know about some computation parameters at compile time, which puts it at a slight performance disadvantage. By using specialization constants, we could reverse this and let the GPU code know about all simulation parameters at compile time.- All variables introduced so far are
uniform
s, which means that all work-items (same thing as CUDA threads) in a given shader execution agree on their value. This greatly improves GPU code performance by letting the GPU hardware share associated state (registers, etc) between concurrent computations, and is in fact mandatory for images and samplers. - To enable binding associated resources from the CPU side, each of these inputs and outputs is assigned a set and binding number. Here, we have made all inputs and outputs part of set 0, and all parameters part of set 1. Thi set-binding hierarchy lets us flip simulation inputs/outputs in a single API transaction, without needing to rebind unchanging parameters.
- Finally, we specify the work-group size, which is Vulkan’s equivalent of CUDA’s thread-block size. For now, we hardcode it in the shader code, but later we could make it runtime-configurable using specialization constants.
After specifying the CPU/GPU interface, we add a few simple utilities to make it easier to use. First, we write a pair of more ergonomic wrappers on top of GLSL’s very general but somewhat unwieldy image and sampler access functions…
// Read the current value of the concentration
vec2 read_uv(const ivec2 input_idx) {
const vec2 input_pos = vec2(input_idx) + 0.5;
return vec2(
texture(us, input_pos).r,
texture(vs, input_pos).r
);
}
// Write the next value of the concentration
void write_uv(const ivec2 output_idx, const vec2 uv) {
imageStore(nextUs, output_idx, vec4(uv.x));
imageStore(nextVs, output_idx, vec4(uv.y));
}
…and second, we give ourselves a way to refer to the stencil’s 2D dimensions and center position:
// Diffusion stencil properties
const ivec2 stencil_shape = ivec2(params.weights.length(),
params.weights[0].length());
const ivec2 stencil_offset = ivec2((stencil_shape - 1) / 2);
Finally, we introduce the body of the Gray-Scott reaction computation:
// What each shader invocation does
void main() {
// Determine which central input and output location we act on
const ivec2 center_idx = ivec2(gl_GlobalInvocationID.xy);
// Read the center value of the concentration
const vec2 uv = read_uv(center_idx);
// Compute diffusion term for U and V
vec2 full_uv = vec2(0.);
const ivec2 top_left = center_idx - stencil_offset;
for (int x = 0; x < stencil_shape.x; ++x) {
for (int y = 0; y < stencil_shape.y; ++y) {
const vec2 stencil_uv = read_uv(top_left + ivec2(x, y));
full_uv += params.weights[x][y] * (stencil_uv - uv);
}
}
const float diffusion_u = params.diffusion_rate_u * full_uv.x;
const float diffusion_v = params.diffusion_rate_v * full_uv.y;
// Deduce rate of change in u and v
const float u = uv.x;
const float v = uv.y;
const float uv_square = u * v * v;
const float du = diffusion_u - uv_square + params.feed_rate * (1.0 - u);
const float dv = diffusion_v + uv_square
- (params.feed_rate + params.kill_rate) * v;
// Update u and v accordingly
write_uv(center_idx, uv + vec2(du, dv) * params.time_step);
}
Again, this could use some unpacking:
- Like CUDA, Vulkan and GLSL use a data-parallel execution model where each
work-item is tasked with processing one element of a 1/2/3D-dimensional grid.
We can read the position that we are tasked to process from the
gl_GlobalInvocationID
global 3D vector. Because our problem is 2D, we extract the first two coordinates of this vector, and convert them from floating-point to integer because that feels most correct here. - We then compute the diffusion term of the differential equation. Here we played a little with GLSL’s vector type to make the computations of U and V less redundant and possibly a little more efficient (depending on how smart your GPU driver’s compiler feels like being today).
- Finally, we update U and V using almost exactly the same logic as the CPU version.
Basic CPU-side integration
Following the historical conventions of GPU programmers, we first save our
GPU-side code into a source file with a .comp
extension, which tells our IDE
tooling that this is a compute shader. We propose that you call it
grayscott.comp
and put it inside of the exercises/src/gpu
.
We then ensure that this shader is compiled to SPIR-V whenever our project is
built by using the vulkano-shaders
library provided by the vulkano
project.
We first add it as a dependency…
cargo add --optional vulkano-shaders
…and within the project’s Cargo.toml
, we make it part of our gpu
optional feature.
[features]
gpu = ["dep:directories", "dep:vulkano", "dep:vulkano-shaders"]
Then within our code’s gpu::pipeline
module, we create a new shader
module
and ask vulkano_shaders
to generate the SPIR-V code and some Rust support code
inside of it:
/// Compute shader used for GPU-side simulation
mod shader {
vulkano_shaders::shader! {
ty: "compute",
path: "src/gpu/grayscott.comp",
}
}
To make the code more maintainable, we give human-readable names to the shader binding points. These constants will still need to be kept up to date if the associated GLSL interface evolves over time, but at least we won’t need to update magic numbers spread all over the code:
#![allow(unused)] fn main() { /// Shader descriptor set to which input and output images are bound pub const IMAGES_SET: u32 = 0; /// Descriptor within `IMAGES_SET` for sampling of input U concentration pub const IN_U: u32 = 0; /// Descriptor within `IMAGES_SET` for sampling of input V concentration pub const IN_V: u32 = 1; /// Descriptor within `IMAGES_SET` for writing to output U concentration pub const OUT_U: u32 = 2; /// Descriptor within `IMAGES_SET` for writing to output V concentration pub const OUT_V: u32 = 3; /// Shader descriptor set to which simulation parameters are bound pub const PARAMS_SET: u32 = 1; /// Descriptor within `PARAMS_SET` for simulation parameters pub const PARAMS: u32 = 0; /// Work-group shape of the shader pub const WORK_GROUP_SHAPE: [u32; 2] = [8, 8]; }
Finally, we will need a Rust struct
which matches the definition of the
GPU-side simulation parameters. Thankfully, vulkano_shaders
saves us from the
trouble of a duplicate type definition by automatically generating one Rust
version of each type from the input GLSL code. All we need to do is to
expose it so other code can use it:
/// Simulation parameters, in a GPU-friendly layout
///
/// This syntax lets you re-export a type defined in another module as if you
/// defined said type yourself
pub use shader::Parameters;
And that’s it: now we have autogenerated SPIR-V in our code, along with a CPU-side version of our CPU-GPU interface declarations.
Exercise
Integrate all of the above into the Rust project’s gpu::pipeline
module, and
make sure that the code still compiles. A test run is not useful here as runtime
behavior is unchanged for now.
Along with krnl
, a CUDA-like
simplified GPU computing layer built on top of rust-gpu
and Vulkan.
In particular, GLSL removed a lot of C’s implicit conversions, and fully drops pointers. The latter are largely replaced by much improved dynamically sized array support and function in/out/inout parameters.
What else would you expect from a language designed by and for computer graphics experts?
…though people these days are increasingly interested in the potential physics simulation applications of the hardware ray-tracing acceleration units that were introduced by recent GPUs from NVidia and AMD.
In case you are wondering, on the C++ side, CUDA and SYCL mostly ignore
this problem by not supporting half of GLSL’s most useful features. But
the rust-gpu
developers aim for performance parity with well-optimized
GLSL code, and are therefore unwilling to settle for this shortcut.