Storage tuning
As you may have guessed, the reason why our copy optimization had an effect on microbenchmarks, but not on the full simulation job, is that our simulation job does something that the microbenchmark doesn’t: writing results to disk. And this activity ends up becoming the performance bottleneck once the rest is taken care of, at least at our default settings of 34 simulation steps per saved image.
Therefore, even though this is not the main topic of this course, I will cover a few things you can try to improve the simulation’s storage I/O performance. This chapter comes with the following caveats:
- I/O timings are a lot more variable than compute timings, which means that a stable benchmarking setup can only obtained with a longer-running job. Microbenchmarking I/O requires a lot of care, and most of the time you should just time the full simulation job.
- The results of I/O optimizations will heavily depend on the storage medium that you are using. Most of the conclusions that I reach in the following chapters are explicitly marked as likely specific to my laptop’s NVMe storage, and it is expected that users of slower storage like hard drives or networked filesystems will reach very different conclusions.
- In real world simulation workloads, you should always keep in mind that doing less I/O is another option that should be on the team’s meeting agenda.