From device to context
At far as the Vulkan C API is concerned, once we have a device and a queue, we
can use pretty much every part of Vulkan. But on a more practical level, there
are some extra conveniences that we will want to set up now in order to make our
later work easier, and as a practical matter the vulkano
high-level API forces
us to set up pretty much all of them anyway. These include…
- Sub-allocators, which let us perform fewer larger allocations from the GPU driver1, and slice them up into the small API objects that we need.
- Pipeline caches, which allow us not to recompile the GPU-side code when neither the code nor the GPU driver has changed. This makes our application run faster, because in GPU applications a large part of the GPU-side compilation work is done at runtime.
We will take care of this remaining vulkano
initialization work in this
chapter.
Allocators
There are three kinds of objects that a Vulkan program may need to allocate in large numbers:
- Memory objects, like buffers and images. These will contain most of the application data that we are directly interested in processing.
- Command buffer objects. These are used to store batches of GPU commands, that will then be sent to the GPU in order to ask it to do something: transfer data, execute our GPU code…
- Descriptor set objects. These are used to attach a set of memory objects to a GPU program. The reason why Vulkan has them instead of making you bind memory objects to GPU programs one by one is that it allows the binding process to be more CPU-efficient, which matters in applications with short-running GPU programs that have many inputs and outputs.
These objects have very different characteristics (memory footprint, alignment,
lifetime, CPU-side access patterns…), and therefore benefit from each
having their own specialized allocation logic. Accordingly, vulkano
provides
us with three standard allocators:
StandardMemoryAllocator
for memory objectsStandardCommandBufferAllocator
for command buffersStandardDescriptorSetAllocator
for descriptor sets
As the Standard
naming implies, these allocators are intended to be good
enough for most applications, and easily replaceable when they don’t fit. I can
tell you from experience that we are very unlikely to ever need to replace them
for our Gray-Scott simulation, so we can just have a couple of type aliases as a
minimal future proofing measure…
pub type MemoryAllocator = vulkano::memory::allocator::StandardMemoryAllocator;
pub type CommandBufferAllocator =
vulkano::command_buffer::allocator::StandardCommandBufferAllocator;
pub type DescriptorSetAllocator =
vulkano::descriptor_set::allocator::StandardDescriptorSetAllocator;
…and a common initialization function that sets up all of them, using the default configuration as it fits the needs of the Gray-Scott simulation very well.
use std::sync::Arc;
use vulkano::{
command_buffer::allocator::StandardCommandBufferAllocatorCreateInfo,
descriptor_set::allocator::StandardDescriptorSetAllocatorCreateInfo,
};
fn create_allocators(
device: Arc<Device>,
) -> (
Arc<MemoryAllocator>,
Arc<CommandBufferAllocator>,
Arc<DescriptorSetAllocator>,
) {
let malloc = Arc::new(MemoryAllocator::new_default(device.clone()));
let calloc = Arc::new(CommandBufferAllocator::new(
device.clone(),
StandardCommandBufferAllocatorCreateInfo::default(),
));
let dalloc = Arc::new(DescriptorSetAllocator::new(
device.clone(),
StandardDescriptorSetAllocatorCreateInfo::default(),
));
(malloc, calloc, dalloc)
}
You may reasonably wonder why we need to manually wrap the allocators in
an Arc
atomically reference-counted smart pointer, when almost every other API
in vulkano
returns an Arc
for you. This bad API originates from the fact
that vulkano
is currently reworking the API of its memory allocators, and had
to release v0.34 in the middle of that rework due to user pressure. So things
will hopefully improve in the next vulkano
release.
Pipeline cache
Why it exists
GPU programs (also called shaders by graphics programmers and kernels by compute programmers) tend to have a more complex compilation process than CPU programs for a few reasons:
- At the time where you compile your application, you generally don’t know what GPU it is eventually going to run on. Needing to recompile the code anytime you want to run on a different GPU, either on the same multi-GPU machine or on a different machine, is a nuisance that many wise people would like to avoid.
- Compared to CPU manufacturers, GPU manufacturers are a lot less attached to the idea of having open ISA specifications. They would rather not fully disclose how their hardware ISA works, and instead only make you manipulate it through a lawyer-approved abstraction layer, that leaks less (presumed) trade secrets and allows them to transparently change more of their hardware architecture from one generation of hardware to the next.
- Because this abstraction layer is fully managed by the GPU driver, the translation of a given program in the fake manufacturer ISA to the actual hardware ISA is not fixed in time and can change from one version of the GPU driver to ther other, as new optimizations are discovered.
Because of this, ahead-of-time compilation of GPU code to the true hardware ISA pretty much does not exist. Instead, some degree of just-in-time compilation is always used. And with that comes the question of how much time you application can spend recompiling the GPU code on every run.
Vulkan approaches this problem from two angles:
- First of all, your GPU code is translated during application compilation into
an intermediate representation called SPIR-V, which is derived from LLVM IR.
The Vulkan implementation is then specified in terms of SPIR-V. This has
several advantages:
- Hardware vendors, who are not reknowned for their compilation expertise, do not need to manipulate higher-level languages anymore. Compared with earlier GPU APIs which trusted them at this task, this is a major improvement in GPU driver reliability.
- Some of the compilation and optimization work can be done at application compilation time. This reduces the amount of work that the GPU driver’s compiler needs to do at application runtime, and the variability of application performance across GPU drivers.
- Second, the result of translating the SPIR-V code to hardware-specific machine code can be saved into a cache and reused in later application runs. For reasons that will become clear later in this course, this cache is called a pipeline cache.
Now, one problem with caches is that they can be invalidated through changes in hardware or GPU driver versions. But Vulkan fully manages this part for you. What you need to handle is the work of saving this cache to disk when an application terminates, and reloading it on the next run of the application. Basically, you get a bunch of bytes from Vulkan, and Vulkan entrusts you for somehow saving those bytes somewhere and giving them back unchanged on the next application run.
The point of exposing this process to the application, instead of hiding it like earlier GPU APIs did, is that it gives advanced Vulkan applications power to…
- Invalidate the cache themselves when the Vulkan driver has a bug which leads to cache invalidation failure or corruption.2
- Provide pre-packaged caches for all common GPU drivers so that even the first run of an application is likely to be fast, without JiT compilation of GPU code.
How we handle it
In our case, we will not do anything fancy with the Vulkan pipeline cache, just mimick what other GPU APIs do under the hood by saving it in the operating system’s standard cache directory at application teardown time and loading it back if it exists at application startup time.
First of all, we need to know what the operating system’s standard cache directory location is. Annoyingly, it is very much non-obvious, changes from one OS to another, and varied across the history of each OS. But thankfully there’s a crate/library for that.
First we add it as an optional dependency…
cargo add --optional directories
…and within the project’s Cargo.toml
, we make it part of our gpu
optional feature.
[features]
gpu = ["dep:directories", "dep:vulkano"]
We then use directories
to locate the OS’ standard application data storage
directories:
use directories::ProjectsDirs;
let dirs = ProjectDirs::from("", "", "grayscott")
.expect("Could not find home directory");
For this simple application, we handle weird OS configurations where the user’s
home directory cannot be found by panicking. A more sophisticated application
might decide instead not to cache GPU pipelines in this case, or even get
dangerous and try random hardcoded paths like /var/grayscott.cache
just in
case they work out.
Finally, we use the computed directories to make a simple abstraction for cache persistence:
use std::path::PathBuf;
use vulkano::pipeline::cache::{PipelineCache, PipelineCacheCreateInfo};
/// Simple cache management abstraction
pub struct PersistentPipelineCache {
/// Standard OS data directories for this project: cache, config, etc
dirs: ProjectDirs,
/// Vulkan pipeline cache
pub cache: Arc<PipelineCache>,
}
//
impl PersistentPipelineCache {
/// Set up a pipeline cache, integrating previously cached data
fn new(device: Arc<Device>) -> Result<Self, Validated<VulkanError>> {
// Find standard OS directories
let dirs = ProjectDirs::from("", "", "grayscott")
.expect("Could not find home directory");
// Try to load former cache data, use empty data on vectors
let initial_data = std::fs::read(Self::cache_path(&dirs)).unwrap_or_default();
// Build Vulkan pipeline cache
//
// This is unsafe because we solemny promise to Vulkan that we did not
// fiddle with the bytes of the cache. And since this is GPU vendor
// code, you better not expect it to validate its inputs.
let cache = unsafe {
PipelineCache::new(
device,
PipelineCacheCreateInfo {
initial_data,
..Default::default()
},
)?
};
Ok(Self { dirs, cache })
}
/// Save the pipeline cache
///
/// It is recommended to call this method manually, rather than let the
/// destructor save the cache automatically, as this lets you control how
/// errors are reported and integrate it into a broader application-wide
/// error handling policy.
pub fn try_save(&mut self) -> Result<(), Box<dyn Error>> {
std::fs::write(Self::cache_path(&self.dirs), self.cache.get_data()?)?;
Ok(())
}
/// Compute the pipeline cache path
fn cache_path(dirs: &ProjectDirs) -> PathBuf {
dirs.cache_dir().join("PipelineCache.bin")
}
}
//
impl Drop for PersistentPipelineCache {
fn drop(&mut self) {
// Cannot cleanly report errors in destructors
if let Err(e) = self.try_save() {
eprintln!("Failed to save Vulkan pipeline cache: {e}");
}
}
}
Putting it all together
As you can see, setting up Vulkan involves a number of steps. In computer
graphics, the tradition is to regroup all of these steps into the constructor of
a large Context
struct whose members feature all API objects that we envision
to need later on. We will follow this tradition:
pub struct VulkanContext {
/// Logical device (used for resource allocation)
pub device: Arc<Device>,
/// Command queue (used for command submission)
pub queue: Arc<Queue>,
/// Memory object allocator
pub memory_alloc: Arc<MemoryAllocator>,
/// Command buffer allocator
pub command_alloc: Arc<CommandBufferAllocator>,
/// Descriptor set allocator
pub descriptor_alloc: Arc<DescriptorSetAllocator>,
/// Pipeline cache with on-disk persistence
pub pipeline_cache: PersistentPipelineCache,
/// Messenger that prints out Vulkan debug messages until destroyed
_messenger: DebugUtilsMessenger,
}
//
impl VulkanContext {
/// Set up the Vulkan context
pub fn new(
debug_println: impl Fn(String) + RefUnwindSafe + Send + Sync + 'static,
) -> Result<Self, Box<dyn Error>> {
let (instance, messenger) = create_instance(debug_println)?;
// Not imposing any extra constraint on devices for now
let physical_device = pick_physical_device(&instance, |_device| true)?;
let (device, queue) = create_device_and_queue(physical_device)?;
let (memory_alloc, command_alloc, descriptor_alloc) = create_allocators(device.clone());
let pipeline_cache = PersistentPipelineCache::new(device.clone())?;
Ok(Self {
device,
queue,
memory_alloc,
command_alloc,
descriptor_alloc,
pipeline_cache,
_messenger: messenger,
})
}
/// Run all inner manual destructors
pub fn finish(&mut self) -> Result<(), Box<dyn Error>> {
self.pipeline_cache.try_save()?;
Ok(())
}
}
And with that, we have wrapped up what the gpu::context
module of the
provided course skeleton does. The rest will be integrated next!
Exercise
In the Rust project that you have been working on so far, the above GPU support
is already present, but it is in a dedicated gpu
module that is only compiled
in when the gpu
compile-time feature is enabled. This is achieved using a
#[cfg(feature = "gpu")]
compiler directive.
So far, that build feature has been disabled by default. This allowed you to
enjoy faster builds, unpolluted by the cost of building GPU dependencies like
vulkano
. These are huge libraries with a significant compilation cost, because
Vulkan itself is a huge specification.
However, now that we actually do want to run with GPU support enabled, the
default of not building the GPU code is not the right one anymore. Therefore,
please add the following line to the [features]
section of the Cargo.toml
file at the root of the repository:
default = ["gpu"]
This has the same effect as passing --features=gpu
to every cargo
command
that you will subsequently run: it will enable the optional gpu
compile-time
feature of the project, along with associated optional dependencies.
You will then want to add a VulkanContext::new()
call at the beginning of the
simulate
microbenchmark and binary, call the finish()
method.
Now, GPU APIs are faillible, so these methods return Result
s and you are
expected to handle the associated errors. We will showcase the flexibility of
Result
by handling errors differently in microbenchmarks and in the main
simulate
binary:
- In the microbenchmark, you will handle errors by panicking. Just call the
expect()
method of the results of the context functions, pass them an error message, and if an error occurs the program will panic with this error message. - In
bin/simulate
, you will instead generalize the current hdf5-specific error type toBox<dyn Error>
, so that you can propagate Vulkan errors out ofmain()
just like we already do for HDF5 errors.
If you have done everything right, both binaries should now compile and run successfully. They are not doing anything useful with Vulkan right now, but lack of error means that our GPU driver/emulation works perform the basic setup work as expected, which is already a start!
Recall that the GPU driver’s memory allocators perform slowly and can have other weird and undesirable properties like heavily limiting the total number of allocations or acquiring global driver mutexes.
Did I already mention that GPU hardware manufacturers are not reknowned for their programming skills?