Gray-Scott introduction

We are now ready to introduce the final ~~boss~~ computation of this course: the Gray-Scott reaction simulation. In this chapter, you will be taken through a rapid tour of the pre-made setup that is provided to you for the purpose of input initialization, kernel benchmarking, and output HDF5 production. We will then conclude this tour by showing how one would implement a simple, unoptimized version of the simulation using ndarray.

Along the way, you will also get a quick glimpse of how Rust’s structs and methods work. We did not get to explore this area of the language to fit the course’s 1-day format, but in a nutshell they can be used to implement encapsulated objects like C++ classes, and that’s what we do here.

Input initialization

The reference C++ implementation of the Gray-Scott simulation hardcodes the initial input state of the simulation, For the sake of keeping things simple and comparable, we will do the same:

use ndarray::Array2;

/// Computation precision
///
/// It is a good idea to make this easy to change in your programs, especially
/// if you care about hardware portability or have ill-specified output
/// precision requirements.
///
/// Notice the "pub" keyword for exposing this to the outside world. All types,
/// functions... are private to the current code module by default in Rust.
pub type Float = f32;

/// Storage for the concentrations of the U and V chemical species
pub struct UV {
    pub u: Array2<Float>,
    pub v: Array2<Float>,
}
//
/// impl blocks like this let us add methods to types in Rust
impl UV {
    /// Set up the hardcoded chemical species concentration
    ///
    /// Notice the `Self` syntax which allows you to refer to the type for which
    /// the method is implemented.
    fn new(num_rows: usize, num_cols: usize) -> Self {
        let shape = [num_rows, num_cols];
        let pattern = |row, col| {
            (row >= (7 * num_rows / 16).saturating_sub(4)
                && row < (8 * num_rows / 16).saturating_sub(4)
                && col >= 7 * num_cols / 16
                && col < 8 * num_cols / 16) as u8 as Float
        };
        let u = Array2::from_shape_fn(shape, |(row, col)| 1.0 - pattern(row, col));
        let v = Array2::from_shape_fn(shape, |(row, col)| pattern(row, col));
        Self { u, v }
    }

    /// Set up an all-zeroes chemical species concentration
    ///
    /// This can be faster than `new()`, especially on operating systems like
    /// Linux where all allocated memory is guaranteed to be initially zeroed
    /// out for security reasons.
    fn zeroes(num_rows: usize, num_cols: usize) -> Self {
        let shape = [num_rows, num_cols];
        let u = Array2::zeros(shape);
        let v = Array2::zeros(shape);
        Self { u, v }
    }

    /// Get the number of rows and columns of the simulation domain
    ///
    /// Notice the `&self` syntax for borrowing the object on which the method
    /// is being called by shared reference.
    pub fn shape(&self) -> [usize; 2] {
        let shape = self.u.shape();
        [shape[0], shape[1]]
    }
}

Double buffering

The Gray-Scott reaction is simulated by updating the concentrations of the U and V chemical species many times. This is done by reading the old concentrations of the chemical species from one array, and writing the new concentrations to another array.

We could be creating a new array of concentrations every time we do this, but this would require performing one memory allocation per simulation step, which can be expensive. Instead, it is more efficient to use the double buffering pattern. In this pattern, we keep two versions of the concentration in an array, and on every step of the simulation, we read from one of the array slots and write to the other array slot. Then we flip the role of the array slots for the next simulation step.

We can translate this pattern into a simple encapsulated object…

/// Double-buffered chemical species concentration storage
pub struct Concentrations {
    buffers: [UV; 2],
    src_is_1: bool,
}
//
impl Concentrations {
    /// Set up the simulation state
    pub fn new(num_rows: usize, num_cols: usize) -> Self {
        Self {
            buffers: [UV::new(num_rows, num_cols), UV::zeroes(num_rows, num_cols)],
            src_is_1: false,
        }
    }

    /// Get the number of rows and columns of the simulation domain
    pub fn shape(&self) -> [usize; 2] {
        self.buffers[0].shape()
    }

    /// Read out the current species concentrations
    pub fn current(&self) -> &UV {
        &self.buffers[self.src_is_1 as usize]
    }

    /// Run a simulation step
    ///
    /// The user callback function `step` will be called with two inputs UVs:
    /// one containing the initial species concentration at the start of the
    /// simulation step, and one to receive the final species concentration that
    /// the simulation step is in charge of generating.
    ///
    /// Notice the `&mut self` syntax for borrowing the object on which
    /// the method is being called by mutable reference.
    pub fn update(&mut self, step: impl FnOnce(&UV, &mut UV)) {
        let [ref mut uv_0, ref mut uv_1] = &mut self.buffers;
        if self.src_is_1 {
            step(uv_1, uv_0);
        } else {
            step(uv_0, uv_1);
        }
        self.src_is_1 = !self.src_is_1;
    }
}

…and then this object can be used to run the simulation like this:

// Set up the concentrations buffer
let mut concentrations = Concentrations::new(num_rows, num_cols);

// ... other initialization work ...

// Main simulation loop
let mut running = true;
while running {
    // Update the concentrations of the U and V chemical species
    concentrations.update(|start, end| {
        // TODO: Derive new "end" concentration from "start" concentration
        end.u.assign(&start.u);
        end.v.assign(&start.v);
    });

    // ... other per-step action, e.g. decide whether to keep running,
    // write the concentrations to disk from time to time ...
    running = false;
}

// Get the final concentrations at the end of the simulation
let result = concentrations.current();
println!("u is {:#?}", result.u);
println!("v is {:#?}", result.v);

HDF5 output

The reference C++ simulation lets you write down the concentration of the V chemical species to an HDF5 file every N computation steps. This can be used to check that the simulation works properly, or to turn the evolving concentration “pictures” into a video for visualization purposes.

Following its example, we will use the hdf5 Rust crate to write data to HDF5 too, using the same file layout conventions for interoperability. Here too, we will use an encapsulated object design to keep things easy to use correctly:

/// Mechanism to write down results to an HDF5 file
pub struct HDF5Writer {
    /// HDF5 file handle
    file: File,

    /// HDF5 dataset
    dataset: Dataset,

    /// Number of images that were written so far
    position: usize,
}

impl HDF5Writer {
    /// Create or truncate the file
    ///
    /// The file will be dimensioned to store a certain amount of V species
    /// concentration arrays.
    ///
    /// The `Result` return type indicates that this method can fail and the
    /// associated I/O errors must be handled somehow.
    pub fn create(file_name: &str, shape: [usize; 2], num_images: usize) -> hdf5::Result<Self> {
        // The ? syntax lets us propagate errors from an inner function call to
        // the caller, when we cannot handle them ourselves.
        let file = File::create(file_name)?;
        let [rows, cols] = shape;
        let dataset = file
            .new_dataset::<Float>()
            .chunk([1, rows, cols])
            .shape([num_images, rows, cols])
            .create("matrix")?;
        Ok(Self {
            file,
            dataset,
            position: 0,
        })
    }

    /// Write a new V species concentration table to the file
    pub fn write(&mut self, result: &UV) -> hdf5::Result<()> {
        self.dataset
            .write_slice(&result.v, (self.position, .., ..))?;
        self.position += 1;
        Ok(())
    }

    /// Flush remaining data to the underlying storage medium and close the file
    ///
    /// This should automatically happen on Drop, but doing it manually allows
    /// you to catch and handle I/O errors properly.
    pub fn close(self) -> hdf5::Result<()> {
        self.file.close()
    }
}

After adding this feature, our simulation code skeleton now looks like this:

// Set up the concentrations buffer
let mut concentrations = Concentrations::new(num_rows, num_cols);

// Set up HDF5 I/O
let mut hdf5 = HDF5Writer::create(file_name, concentrations.shape(), num_output_steps)?;

// Produce the requested amount of concentration tables
for _ in 0..num_output_steps {
    // Run a number of simulation steps
    for _ in 0..compute_steps_per_output_step {
        // Update the concentrations of the U and V chemical species
        concentrations.update(|start, end| {
            // TODO: Derive new "end" concentration from "start" concentration
            end.u.assign(&start.u);
            end.v.assign(&start.v);
        });
    }

    // Write down the current simulation output
    hdf5.write(concentrations.current())?;
}

// Close the HDF5 file
hdf5.close()?;

Reusable simulation skeleton

Right now, our simulation’s update function is a stub that simply copies the input concentrations to the output concentrations without actually changing them. At some point, we are going to need to compute the real updated chemical species concentrations there.

However, we also know from our growing experience with software performance optimization that we are going to need tweak this part of the code a lot. It would be great if we could do this in a laser-focused function that is decoupled from the rest of the code, so that we can easily do things like swapping computation backends and seeing what it changes. As it turns out, a judiciously placed callback interface lets us do just this:

/// Simulation runner options
pub struct RunnerOptions {
    /// Number of rows in the concentration table
    num_rows: usize,

    /// Number of columns in the concentration table
    num_cols: usize,

    /// Output file name
    file_name: String,

    /// Number of simulation steps to write to the output file
    num_output_steps: usize,

    /// Number of computation steps to run between each write
    compute_steps_per_output_step: usize,
}

/// Simulation runner, with a user-specified concentration update function
pub fn run_simulation(
    opts: &RunnerOptions,
    // Notice that we must use FnMut here because the update function can be
    // called multiple times, which FnOnce does not allow.
    mut update: impl FnMut(&UV, &mut UV),
) -> hdf5::Result<()> {
    // Set up the concentrations buffer
    let mut concentrations = Concentrations::new(opts.num_rows, opts.num_cols);

    // Set up HDF5 I/O
    let mut hdf5 = HDF5Writer::create(
        &opts.file_name,
        concentrations.shape(),
        opts.num_output_steps,
    )?;

    // Produce the requested amount of concentration tables
    for _ in 0..opts.num_output_steps {
        // Run a number of simulation steps
        for _ in 0..opts.compute_steps_per_output_step {
            // Update the concentrations of the U and V chemical species
            concentrations.update(&mut update);
        }

        // Write down the current simulation output
        hdf5.write(concentrations.current())?;
    }

    // Close the HDF5 file
    hdf5.close()
}

Command-line options

Our simulation has a fair of tuning parameters. To those that we have already listed in RunnerOptions, the computational chemistry of the Gray-Scott reaction requires that we add the following tunable parameters:

The speed at which V turns into P
The speed at which U is added to the simulation and U, V and P are removed
The amount of simulated time that passes between simulation steps

We could just hardcode all these parameters, but doing so would anger the gods of software engineering and break feature parity with the reference C++ version. So instead we will make these parameters configurable via command-line parameters whose syntax and semantics strictly match those of the C++ version.

To this end, we can use the excellent clap library, which provides the best API for parsing command line options that I have ever seen in any programming language.

The first step, as usual, is to clap as a dependency to our project. We will also enable the derive optional feature, which is the key to the aforementioned nice API:

cargo add --features=derive clap

We will then add some annotations to the definition of our options structs, explaining how they map to the command-line options that our program expects (which follow the syntax and defaults of the C++ reference version for interoperability):

use clap::Args;

/// Simulation runner options
#[derive(Debug, Args)]
pub struct RunnerOptions {
    /// Number of rows in the concentration table
    #[arg(short = 'r', long = "nbrow", default_value_t = 1080)]
    pub num_rows: usize,

    /// Number of columns in the concentration table
    #[arg(short = 'c', long = "nbcol", default_value_t = 1920)]
    pub num_cols: usize,

    /// Output file name
    #[arg(short = 'o', long = "output", default_value = "output.h5")]
    pub file_name: String,

    /// Number of simulation steps to write to the output file
    #[arg(short = 'n', long = "nbimage", default_value_t = 1000)]
    pub num_output_steps: usize,

    /// Number of computation steps to run between each write
    #[arg(short = 'e', long = "nbextrastep", default_value_t = 34)]
    pub compute_steps_per_output_step: usize,
}

/// Simulation update options
#[derive(Debug, Args)]
pub struct UpdateOptions {
    /// Speed at which U is added to the simulation and U, V and P are removed
    #[arg(short, long, default_value_t = 0.014)]
    pub feedrate: Float,

    /// Speed at which V turns into P
    #[arg(short, long, default_value_t = 0.054)]
    pub killrate: Float,

    /// Simulated time interval on each simulation step
    #[arg(short = 't', long, default_value_t = 1.0)]
    pub deltat: Float,
}

We then create a top-level struct which represents our full command-line interface…

use clap::Parser;

/// Gray-Scott reaction simulation
///
/// This program simulates the Gray-Scott reaction through a finite difference
/// schema that gets integrated via the Euler method.
#[derive(Debug, Parser)]
#[command(version)]
pub struct Options {
    #[command(flatten)]
    runner: RunnerOptions,
    #[command(flatten)]
    pub update: UpdateOptions,
}

…and in the main function of our final application, we call the automatically generated parse() method of that struct and retrieve the parsed command-line options.

fn main() {
    let options = Options::parse();

    // ... now do something with "options" ...
}

That’s it. With no extra work, clap will automatically provide our simulation with a command-line interface that follows all standard Unix conventions (e.g. supports both --option value and --option=value), handles user errors, parses argument strings to their respective concrete Rust types, and prints auto-generated help strings when -h or --help is passed.

Also, if you spend 10 more minutes on it, you can make as many of these options as you want configurable via environment variables too. Which can be convenient in scenarios where you cannot receive configuration through CLI parameters, like inside of criterion microbenchmarks.

Hardcoded parameters

Not all parameters of the C++ reference version are configurable. Some of them are hardcoded, and can only be changed by altering the source code. Since we are aiming for perfect user interface parity with the C++ version, we want to replicate this design in the Rust version.

For now, we will do this by adding a few constants with the hardcoded values to the source code:

#![allow(unused)]
fn main() {
type Float = f32;

/// Weights of the discrete convolution stencil
pub const STENCIL_WEIGHTS: [[Float; 3]; 3] = [
    [0.25, 0.5, 0.25],
    [0.5,  0.0, 0.5],
    [0.25, 0.5, 0.25]
];

/// Offset from the top-left corner of STENCIL_WEIGHTS to its center
pub const STENCIL_OFFSET: [usize; 2] = [1, 1];

/// Diffusion rate of the U species
pub const DIFFUSION_RATE_U: Float = 0.1;

/// Diffusion rate of the V species
pub const DIFFUSION_RATE_V: Float = 0.05;
}

In Rust, const items let you declare compile-time constants, much like constexpr variables in C++, parameters in Fortran, and #define STUFF 123 in C. We do not have the time to dive into the associated language infrastructure, but for now, all you need to know is that the value of a const will be copy-pasted on each point of use, which ensures that the compiler optimizer can specialize the code for the value of the parameter of interest.

Progress reporting

Simulations can take a long time to run. It is not nice to make users wait for them to run to completion without any CLI output indicating how far along they are and how much time remains until they are done. Especially when it is very easy to add such reporting in Rust, thanks to the wonderful indicatif library.

To use it, we start by adding the library to our project’s dependencies…

cargo add indicatif

Then in our main function, we create a progress bar with a number of steps matching the number of computation steps…

use indicatif::ProgressBar;


let progress = ProgressBar::new(
    (options.runner.num_output_steps
        * options.runner.compute_steps_per_output_step) as u64,
);

…we increment it on each computation step…

progress.inc(1);

…and at the end of the simulation, we tell indicatif that we are done¹:

progress.finish();

That’s all we need to add basic progress reporting to our simulation.

Final code layout

This is not a huge amount of code overall, but it does get uncomfortably large and unfocused for a single code module. So in the exercises Rust project, the simulation code has been split over multiple code modules.

We do not have the time to cover the Rust module system in this course, but if you are interested, feel free to skim through the code to get a rough idea of how modularization is done, and ask any question that comes to your mind while doing so.

Microbenchmarks can only access code from the main library (below the src directory of the project, excluding the bin/ subdirectory), therefore most of the code lies there. In addition, we have added a simulation binary under src/bin/simulate.rs, and a microbenchmark under benches/simulate.rs.

Exercise

Here is a naïve implementation of a Gray-Scott simulation step implemented using ndarray:

use crate::options::{DIFFUSION_RATE_U, DIFFUSION_RATE_V, STENCIL_OFFSET, STENCIL_WEIGHTS};

/// Simulation update function
pub fn update(opts: &UpdateOptions, start: &UV, end: &mut UV) {
    // Species concentration matrix shape
    let shape = start.shape();

    // Iterate over pixels of the species concentration matrices
    ndarray::azip!(
        (
            index (out_row, out_col),
            out_u in &mut end.u,
            out_v in &mut end.v,
            &u in &start.u,
            &v in &start.v
        ) {
            // Determine the stencil's input region
            let out_pos = [out_row, out_col];
            let stencil_start = array2(|i| out_pos[i].saturating_sub(STENCIL_OFFSET[i]));
            let stencil_end = array2(|i| (out_pos[i] + STENCIL_OFFSET[i] + 1).min(shape[i]));
            let stencil_range = array2(|i| stencil_start[i]..stencil_end[i]);
            let stencil_slice = ndarray::s![stencil_range[0].clone(), stencil_range[1].clone()];

            // Compute the diffusion gradient for U and V
            let [full_u, full_v] = (start.u.slice(stencil_slice).indexed_iter())
                .zip(start.v.slice(stencil_slice))
                .fold(
                    [0.; 2],
                    |[acc_u, acc_v], (((in_row, in_col), &stencil_u), &stencil_v)| {
                        let weight = STENCIL_WEIGHTS[in_row][in_col];
                        [acc_u + weight * (stencil_u - u), acc_v + weight * (stencil_v - v)]
                    },
                );

            // Deduce the change in U and V concentration
            let uv_square = u * v * v;
            let du = DIFFUSION_RATE_U * full_u - uv_square + opts.feedrate * (1.0 - u);
            let dv = DIFFUSION_RATE_V * full_v + uv_square
                - (opts.feedrate + opts.killrate) * v;
            *out_u = u + du * opts.deltat;
            *out_v = v + dv * opts.deltat;
        }
    );
}

/// Shorthand for creating a 2D Rust array from an index -> value mapping
fn array2<T>(f: impl FnMut(usize) -> T) -> [T; 2] {
    std::array::from_fn(f)
}

Please integrate it into the codebase such that it can is used by both the simulation binary at src/bin/simulate.rs and the microbenchmark at benches/simulate.rs. Then make sure everything works by running both of them using the following commands:

# Must use -- to separate cargo options from program options
cargo run --release --bin simulate -- -n 5 -e 2
cargo bench --bench simulate

It is expected that the last command will take a few minutes to complete. We are just at the start of our journey, and there’s a lot of optimization work to do. But the set of benchmark configurations is designed to remain relevant by the time where the simulation will be running much, much faster.

Also, starting at this chapter, the exercises are going to get significantly more complex. Therefore, it is a good idea to keep track of old versions of your work and have a way to get back to old versions. To do this, you can turn the exercises codebase into a git repository…

cd ~/exercises
git init
git add --all
git commit -m "Initial commit"

…then save a commit at the end of each chapter, or more generally whenever you feel like you have a codebase state that’s worth keeping around for later.

git add --all
git commit -m "<Describe your code changes here>"

This step is needed because indicatif allows you to add more work to the progress bar.

Gray-Scott with Rust