Compression
A closer look at the I/O code
Intuitively, storage I/O performance while the simulation is running can be affected either by the way storage I/O is configured, or by what we are doing on every storage write:
/// Create or truncate the file
///
/// The file will be dimensioned to store a certain amount of V species
/// concentration arrays.
///
/// The `Result` return type indicates that this method can fail and the
/// associated I/O errors must be handled somehow.
pub fn create(file_name: &str, shape: [usize; 2], num_images: usize) -> hdf5::Result<Self> {
// The ? syntax lets us propagate errors from an inner function call to
// the caller, when we cannot handle them ourselves.
let file = File::create(file_name)?;
let [rows, cols] = shape;
let dataset = file
.new_dataset::<Float>()
.chunk([1, rows, cols])
.shape([num_images, rows, cols])
.create("matrix")?;
Ok(Self {
file,
dataset,
position: 0,
})
}
/// Write a new V species concentration table to the file
pub fn write(&mut self, result_v: ArrayView2<Float>) -> hdf5::Result<()> {
self.dataset
.write_slice(result_v, (self.position, .., ..))?;
self.position += 1;
Ok(())
}
Obviously, we cannot change much in write()
, so let’s focus on chat happens
inside of create()
. There are two obvious areas of leverage:
- We can change our hardcoded chunk size of 1 to something larger, and see if doing I/O at a higher granularity helps.
- Try to enable additional HDF5 options, such as compression, to reduce the volume of data that is eventually sent to the storage device.
In which order should we perform these optimizations? Well, compression is affected by block size, since it feeds the compression engine with more data, which can be either good (more patterns to compress) or bad (worse CPU cache locality slowing down the compression algorithm). Therefore, we should try to enable compression first.
Exercise
Previous experience from the course’s author suggests that on modern NVMe storage devices, only the LZ4/LZO/LZF family of fast compressors are still worthwhile. Anything more sophisticated, even Zstandard at compression level 1, will result in a net slowdown.
Therefore, please try to enable LZF dataset compression…
let dataset = file
.new_dataset::<Float>()
.chunk([1, rows, cols])
.lzf() // <- This is new
.shape([num_images, rows, cols])
.create("matrix")?;
…and see if it helps or hurts for this particular computation, on your storage hardware.
You will need to enable the lzf
optional feature of the hdf5
crate for
this to work. This has already been done for you in the container images to
accomodate the network access policies of HPC centers, but for reference, you
can do it like this:
cargo add --features=lzf hdf5
To see another side of the data compression tradeoff, also check the size of the output file before and after performing this change.