From device to context

At far as the Vulkan C API is concerned, once we have a device and a queue, we can use pretty much every part of Vulkan. But on a more practical level, there are some extra conveniences that we will want to set up now in order to make our later work easier, and as a practical matter the vulkano high-level API forces us to set up pretty much all of them anyway. These include…

Sub-allocators, which let us perform fewer larger allocations from the GPU driver¹, and slice them up into the small API objects that we need.
Pipeline caches, which allow us not to recompile the GPU-side code when neither the code nor the GPU driver has changed. This makes our application run faster, because in GPU applications a large part of the GPU-side compilation work is done at runtime.

We will take care of this remaining vulkano initialization work in this chapter.

Allocators

There are three kinds of objects that a Vulkan program may need to allocate in large numbers:

Memory objects, like buffers and images. These will contain most of the application data that we are directly interested in processing.
Command buffer objects. These are used to store batches of GPU commands, that will then be sent to the GPU in order to ask it to do something: transfer data, execute our GPU code…
Descriptor set objects. These are used to attach a set of memory objects to a GPU program. The reason why Vulkan has them instead of making you bind memory objects to GPU programs one by one is that it allows the binding process to be more CPU-efficient, which matters in applications with short-running GPU programs that have many inputs and outputs.

These objects have very different characteristics (memory footprint, alignment, lifetime, CPU-side access patterns…), and therefore benefit from each having their own specialized allocation logic. Accordingly, vulkano provides us with three standard allocators:

StandardMemoryAllocator for memory objects
StandardCommandBufferAllocator for command buffers
StandardDescriptorSetAllocator for descriptor sets

As the Standard naming implies, these allocators are intended to be good enough for most applications, and easily replaceable when they don’t fit. I can tell you from experience that we are very unlikely to ever need to replace them for our Gray-Scott simulation, so we can just have a couple of type aliases as a minimal future proofing measure…

pub type MemoryAllocator = vulkano::memory::allocator::StandardMemoryAllocator;
pub type CommandBufferAllocator =
    vulkano::command_buffer::allocator::StandardCommandBufferAllocator;
pub type DescriptorSetAllocator =
    vulkano::descriptor_set::allocator::StandardDescriptorSetAllocator;

…and a common initialization function that sets up all of them, using the default configuration as it fits the needs of the Gray-Scott simulation very well.

use std::sync::Arc;
use vulkano::{
    command_buffer::allocator::StandardCommandBufferAllocatorCreateInfo,
    descriptor_set::allocator::StandardDescriptorSetAllocatorCreateInfo,
};

fn create_allocators(
    device: Arc<Device>,
) -> (
    Arc<MemoryAllocator>,
    Arc<CommandBufferAllocator>,
    Arc<DescriptorSetAllocator>,
) {
    let malloc = Arc::new(MemoryAllocator::new_default(device.clone()));
    let calloc = Arc::new(CommandBufferAllocator::new(
        device.clone(),
        StandardCommandBufferAllocatorCreateInfo::default(),
    ));
    let dalloc = Arc::new(DescriptorSetAllocator::new(
        device.clone(),
        StandardDescriptorSetAllocatorCreateInfo::default(),
    ));
    (malloc, calloc, dalloc)
}

You may reasonably wonder why we need to manually wrap the allocators in an Arc atomically reference-counted smart pointer, when almost every other API in vulkano returns an Arc for you. This bad API originates from the fact that vulkano is currently reworking the API of its memory allocators, and had to release v0.34 in the middle of that rework due to user pressure. So things will hopefully improve in the next vulkano release.

Pipeline cache

Why it exists

GPU programs (also called shaders by graphics programmers and kernels by compute programmers) tend to have a more complex compilation process than CPU programs for a few reasons:

At the time where you compile your application, you generally don’t know what GPU it is eventually going to run on. Needing to recompile the code anytime you want to run on a different GPU, either on the same multi-GPU machine or on a different machine, is a nuisance that many wise people would like to avoid.
Compared to CPU manufacturers, GPU manufacturers are a lot less attached to the idea of having open ISA specifications. They would rather not fully disclose how their hardware ISA works, and instead only make you manipulate it through a lawyer-approved abstraction layer, that leaks less (presumed) trade secrets and allows them to transparently change more of their hardware architecture from one generation of hardware to the next.
Because this abstraction layer is fully managed by the GPU driver, the translation of a given program in the fake manufacturer ISA to the actual hardware ISA is not fixed in time and can change from one version of the GPU driver to ther other, as new optimizations are discovered.

Because of this, ahead-of-time compilation of GPU code to the true hardware ISA pretty much does not exist. Instead, some degree of just-in-time compilation is always used. And with that comes the question of how much time you application can spend recompiling the GPU code on every run.

Vulkan approaches this problem from two angles:

First of all, your GPU code is translated during application compilation into an intermediate representation called SPIR-V, which is derived from LLVM IR. The Vulkan implementation is then specified in terms of SPIR-V. This has several advantages:
- Hardware vendors, who are not reknowned for their compilation expertise, do not need to manipulate higher-level languages anymore. Compared with earlier GPU APIs which trusted them at this task, this is a major improvement in GPU driver reliability.
- Some of the compilation and optimization work can be done at application compilation time. This reduces the amount of work that the GPU driver’s compiler needs to do at application runtime, and the variability of application performance across GPU drivers.
Second, the result of translating the SPIR-V code to hardware-specific machine code can be saved into a cache and reused in later application runs. For reasons that will become clear later in this course, this cache is called a pipeline cache.

Now, one problem with caches is that they can be invalidated through changes in hardware or GPU driver versions. But Vulkan fully manages this part for you. What you need to handle is the work of saving this cache to disk when an application terminates, and reloading it on the next run of the application. Basically, you get a bunch of bytes from Vulkan, and Vulkan entrusts you for somehow saving those bytes somewhere and giving them back unchanged on the next application run.

The point of exposing this process to the application, instead of hiding it like earlier GPU APIs did, is that it gives advanced Vulkan applications power to…

Invalidate the cache themselves when the Vulkan driver has a bug which leads to cache invalidation failure or corruption.²
Provide pre-packaged caches for all common GPU drivers so that even the first run of an application is likely to be fast, without JiT compilation of GPU code.

How we handle it

In our case, we will not do anything fancy with the Vulkan pipeline cache, just mimick what other GPU APIs do under the hood by saving it in the operating system’s standard cache directory at application teardown time and loading it back if it exists at application startup time.

First of all, we need to know what the operating system’s standard cache directory location is. Annoyingly, it is very much non-obvious, changes from one OS to another, and varied across the history of each OS. But thankfully there’s a crate/library for that.

First we add it as an optional dependency…

cargo add --optional directories

…and within the project’s Cargo.toml, we make it part of our gpu optional feature.

[features]
gpu = ["dep:directories", "dep:vulkano"]

We then use directories to locate the OS’ standard application data storage directories:

use directories::ProjectsDirs;

let dirs = ProjectDirs::from("", "", "grayscott")
                       .expect("Could not find home directory");

For this simple application, we handle weird OS configurations where the user’s home directory cannot be found by panicking. A more sophisticated application might decide instead not to cache GPU pipelines in this case, or even get dangerous and try random hardcoded paths like /var/grayscott.cache just in case they work out.

Finally, we use the computed directories to make a simple abstraction for cache persistence:

use std::path::PathBuf;
use vulkano::pipeline::cache::{PipelineCache, PipelineCacheCreateInfo};

/// Simple cache management abstraction
pub struct PersistentPipelineCache {
    /// Standard OS data directories for this project: cache, config, etc
    dirs: ProjectDirs,

    /// Vulkan pipeline cache
    pub cache: Arc<PipelineCache>,
}
//
impl PersistentPipelineCache {
    /// Set up a pipeline cache, integrating previously cached data
    fn new(device: Arc<Device>) -> Result<Self, Validated<VulkanError>> {
        // Find standard OS directories
        let dirs = ProjectDirs::from("", "", "grayscott")
                               .expect("Could not find home directory");

        // Try to load former cache data, use empty data on vectors
        let initial_data = std::fs::read(Self::cache_path(&dirs)).unwrap_or_default();

        // Build Vulkan pipeline cache
        //
        // This is unsafe because we solemny promise to Vulkan that we did not
        // fiddle with the bytes of the cache. And since this is GPU vendor
        // code, you better not expect it to validate its inputs.
        let cache = unsafe {
            PipelineCache::new(
                device,
                PipelineCacheCreateInfo {
                    initial_data,
                    ..Default::default()
                },
            )?
        };
        Ok(Self { dirs, cache })
    }

    /// Save the pipeline cache
    ///
    /// It is recommended to call this method manually, rather than let the
    /// destructor save the cache automatically, as this lets you control how
    /// errors are reported and integrate it into a broader application-wide
    /// error handling policy.
    pub fn try_save(&mut self) -> Result<(), Box<dyn Error>> {
        std::fs::write(Self::cache_path(&self.dirs), self.cache.get_data()?)?;
        Ok(())
    }

    /// Compute the pipeline cache path
    fn cache_path(dirs: &ProjectDirs) -> PathBuf {
        dirs.cache_dir().join("PipelineCache.bin")
    }
}
//
impl Drop for PersistentPipelineCache {
    fn drop(&mut self) {
        // Cannot cleanly report errors in destructors
        if let Err(e) = self.try_save() {
            eprintln!("Failed to save Vulkan pipeline cache: {e}");
        }
    }
}

Putting it all together

As you can see, setting up Vulkan involves a number of steps. In computer graphics, the tradition is to regroup all of these steps into the constructor of a large Context struct whose members feature all API objects that we envision to need later on. We will follow this tradition:

pub struct VulkanContext {
    /// Logical device (used for resource allocation)
    pub device: Arc<Device>,

    /// Command queue (used for command submission)
    pub queue: Arc<Queue>,

    /// Memory object allocator
    pub memory_alloc: Arc<MemoryAllocator>,

    /// Command buffer allocator
    pub command_alloc: Arc<CommandBufferAllocator>,

    /// Descriptor set allocator
    pub descriptor_alloc: Arc<DescriptorSetAllocator>,

    /// Pipeline cache with on-disk persistence
    pub pipeline_cache: PersistentPipelineCache,

    /// Messenger that prints out Vulkan debug messages until destroyed
    _messenger: DebugUtilsMessenger,
}
//
impl VulkanContext {
    /// Set up the Vulkan context
    pub fn new(
        debug_println: impl Fn(String) + RefUnwindSafe + Send + Sync + 'static,
    ) -> Result<Self, Box<dyn Error>> {
        let (instance, messenger) = create_instance(debug_println)?;
        // Not imposing any extra constraint on devices for now
        let physical_device = pick_physical_device(&instance, |_device| true)?;
        let (device, queue) = create_device_and_queue(physical_device)?;
        let (memory_alloc, command_alloc, descriptor_alloc) = create_allocators(device.clone());
        let pipeline_cache = PersistentPipelineCache::new(device.clone())?;
        Ok(Self {
            device,
            queue,
            memory_alloc,
            command_alloc,
            descriptor_alloc,
            pipeline_cache,
            _messenger: messenger,
        })
    }

    /// Run all inner manual destructors
    pub fn finish(&mut self) -> Result<(), Box<dyn Error>> {
        self.pipeline_cache.try_save()?;
        Ok(())
    }
}

And with that, we have wrapped up what the gpu::context module of the provided course skeleton does. The rest will be integrated next!

Exercise

In the Rust project that you have been working on so far, the above GPU support is already present, but it is in a dedicated gpu module that is only compiled in when the gpu compile-time feature is enabled. This is achieved using a #[cfg(feature = "gpu")] compiler directive.

So far, that build feature has been disabled by default. This allowed you to enjoy faster builds, unpolluted by the cost of building GPU dependencies like vulkano. These are huge libraries with a significant compilation cost, because Vulkan itself is a huge specification.

However, now that we actually do want to run with GPU support enabled, the default of not building the GPU code is not the right one anymore. Therefore, please add the following line to the [features] section of the Cargo.toml file at the root of the repository:

default = ["gpu"]

This has the same effect as passing --features=gpu to every cargo command that you will subsequently run: it will enable the optional gpu compile-time feature of the project, along with associated optional dependencies.

You will then want to add a VulkanContext::new() call at the beginning of the simulate microbenchmark and binary, call the finish() method.

Now, GPU APIs are faillible, so these methods return Results and you are expected to handle the associated errors. We will showcase the flexibility of Result by handling errors differently in microbenchmarks and in the main simulate binary:

In the microbenchmark, you will handle errors by panicking. Just call the expect() method of the results of the context functions, pass them an error message, and if an error occurs the program will panic with this error message.
In bin/simulate, you will instead generalize the current hdf5-specific error type to Box<dyn Error>, so that you can propagate Vulkan errors out of main() just like we already do for HDF5 errors.

If you have done everything right, both binaries should now compile and run successfully. They are not doing anything useful with Vulkan right now, but lack of error means that our GPU driver/emulation works perform the basic setup work as expected, which is already a start!

Recall that the GPU driver’s memory allocators perform slowly and can have other weird and undesirable properties like heavily limiting the total number of allocations or acquiring global driver mutexes.

Did I already mention that GPU hardware manufacturers are not reknowned for their programming skills?

Gray-Scott with Rust