from devito import Eq, Grid, Operator, TimeFunction
from devitopro import *
from devitopro.types.enriched import NoLayers
nt = 4
grid = Grid(shape=(4, 4))
u = TimeFunction(name='u', grid=grid)
usave = TimeFunction(name='usave', grid=grid, save=nt, compression='noop')
v = TimeFunction(name='v', grid=grid)
vsave = TimeFunction(name='vsave', grid=grid, save=nt, layers=NoLayers)Compression
There exist a variety of approximation methods to reduce the memory footprint of adjoint-based optimization problems such as FWI. Among the most popular is compression, which is widely used in industry.
In this notebook we show how to compress and decompress TimeFunctions with DevitoPRO.
Support
Currently, DevitoPRO supports the following compression libraries:
Bitcomp. This is the state-of-the-art compression library by NVidia for seismic imaging. It employs an adaptive delta for integer quantization based on the maximum amplitude value at compression time.CvxCompress. This is the MIT-licensed wavelete-based compression library by Chevron specifically designed for wavefield compression. It emplys a multi-scale multi-dimensional wavelet transform. It employs a relative apmlitude thresholdcompscaleto zero out small wavelet coefficients and local blocks for parallelization.
These libraries can be used with the following combinations of platform, arch, and language:
| Library | platform | arch | language |
|---|---|---|---|
| Bitcomp | cpu64 | nvc | {C, openmp} |
| Bitcomp | nvidiaX | cuda | cuda |
| CvxCompress | cpu64 | nvc/gcc/icc | {C, openmp} |
In the table above, cpu64 represents any CPU architecture while nvidiaX represents any NVidia GPU. For a comprehensive list of possible platform, arch, and language values, see here.
API
It is up to the user to state what TimeFunctions will be compressed and decompressed. The API is straightforward:
usave = TimeFunction(name='usave', grid=grid, save=nt, compression=<compression library name>)
The compiler will generate code that:
- compresses
usaveupon each write. - decompresses
usavebefore it’s read.
It’s as simple as that!
Bitcomp
usave = TimeFunction(name='usave', grid=grid, save=nt, compression='bitcomp')
We recall from above that Bitcomp uses an adaptive delta for integer quantization based on the maximum amplitude value at compression time. The maximum error between the uncompressed and the original data will not exceed delta. Since
delta = (dataMax - dataMin) / (2^nbits - 1),
the user can control the maximum error via nbits, which, in other words, represents the number of bits a floating point number gets mapped to. After empirical evaluation, we chose nbits=9 as default value, which means delta = dataMax / 511. One can easily try different nbits values by supplying an override at Operator-application time, for example:
op = Operator(...)
op.apply(..., nbits=16)
CvxCompress
usave = TimeFunction(name='usave', grid=grid, save=nt, compression='cvxcompress')
We recall from above that CvxCompress uses an adaptive compscale for thresholding based on the maximum amplitude value at compression time. The user can control the maximum error via compscale, which, in other words, represents the how much values will be thrown out in the avelet domain. After empirical evaluation, we chose compscale=1e-3 as default value. One can easily try different compsclae values by supplying an override at Operator-application time, for example:
op = Operator(...)
op.apply(..., compscale=1e-5)
where higher values will lead to more compression (and more loss) while smaller values will lead to less compression and more accurate reconstruction. Recommended values are between 1e-2 and 1e-6.
Usage
Alright, let’s look at an actual example. We’re gonna use the built-in “dummy” compression library noop, which, instead of actual compression/decompression routines, uses memcpy’s, thus simulating the worst-case scenario for compression efficiency. This makes the example compression-agnostic and readily runnable on any system as long as DevitoPRO is installed.
We start with declaring some symbolic objects.
We now create a trivial forward-marching Operator, which emulates a PDE solve.
eqns0 = [Eq(u.forward, u + 1.),
Eq(usave, u)]
op0 = Operator(eqns0)There are some lessons here:
- First of all, we observe there are two
TimeFunctions:uis used in the PDE,usaveis, in essence, for stacking snapshots. - In
Eq(usave, u),usavegets written; as explained above, this will trigger data compression after theEqis performed. This is opaque to the user, and – good news – no other actions are expected on the user front. However, it may help to know when compression takes place. - In this example we’re using
compression='noop', so the content ofuat a given timestep will be transfered “as is” intousave. - A more complicated
Eqcould have been used, for exampleEq(usave, u.dt2). No issues with that! In this case, firstu.dt2gets calculated, then the result is compressed intousave.
Let’s now proceed with a trivial backward-marching Operator, which emulates an adjoint operator.
eqns1 = [Eq(v.backward, v - 1),
Eq(vsave, usave)]
op1 = Operator(eqns1)Some more observations:
- In Eq(vsave, usave), usave is read; as explained above, this will trigger data decompression before the
Eqis performed. Again, it may help to know when decompression “logically” takes place. - We say “logically” because, under-the-hood, the compiler performs several optimizations to maximize the efficiency of the decompression, among which:
- Anticipating decompression as much as possible, in tandem with prefetching if the data resides on disk or in GPU memory;
- Hiding (overlapping) decompression with computation whenever possible.
- More complicated
Eqs are supported, e.g.Eq(vsave, usave.dx2).
OK, let’s now try running these two Operators.
#NBVAL_IGNORE_OUTPUT
summary0 = op0.apply(time_M=nt-1)
summary1 = op1.apply()Operator `Kernel` ran in 0.01 s
Operator `Kernel` ran in 0.01 s
Alright, they ran. What do we expect to see now? Well, at the end of op1, vsave should contain the same data as usave. So let’s have a look.
import numpy as np
from devitopro.builtins import decompress
# Access decompressed data
usave_dec, = decompress(usave)
assert np.all(usave_dec.data == vsave.data)
assert all(np.all(vsave.data[i] == i) for i in range(nt))
print(vsave.data)[[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]
[[2. 2. 2. 2.]
[2. 2. 2. 2.]
[2. 2. 2. 2.]
[2. 2. 2. 2.]]
[[3. 3. 3. 3.]
[3. 3. 3. 3.]
[3. 3. 3. 3.]
[3. 3. 3. 3.]]]
Implications
There are some implications on the user front.
- The
.dataattribute will only show a representation of the compressed data. - The
.shapeattribute will return a dummy value, as the compressed data is inherently unstructured.
However, in real applications there is hardly a need to access the .data field directly: usave will be compressed in a forward modelling operator and seamlessly decompressed in an adjoint operator. However, should the user want to peek at the data in between these two operators, the built-in function decompress is provided:
from devitopro.builtins import decompress
data = decompress(usave)
Limitations
Currently, compressed TimeFunctions are subjected to some limitations.
- An
Operatorcan write or read a compressedTimeFunction, but not both. - A compressed
TimeFunctionis a “one-time write” object: it can only be written by a specificOperator, and only via a single call toop.apply(). Otherwise, an exception will be raised by the runtime.
On the other hand, a compressed TimeFunction can be read by an arbitrary number of Operators.
We introduced these limitations to simplify the implementation.
However, in our experience, their significance in the context of PDE-constrained optimization algorithms is negligible.
FAQs
My code would look neater if I could re-initialize usave and use it to compute another shot in my inversion loop.
Fair enough. Compressed TimeFunctions often need a large number of arguments to be instantiated, so we’d like to avoid that. Well, it turns out you can easily create “twin” or “clone” TimeFunctions via the .func API.
usave1 = usave.func(name='usave1')
It doesn’t matter at this point whether usave had already been consumed or not. usave1 is a brand-new object, which at the moment carries no data whatsoever. And it’s structurally identical to usave except for the name. In other words, the above is equivalent to:
usave1 = TimeFunction(name='usave1', <all other arguments used to create `usave`> )
Requirements
All the compression libraries described in this notebook are provided out-of-the-box by DevitoPRO, so they don’t need to be installed separately.
Further, if you’re using our Dockerfiles, you’re already good to go.
Otherwise, you should be aware of the following dependencies:
- Bitcomp. The NVidia HPC SDK must be available on the system, even when running on just CPUs. The NVidia HPC SDK ships the CUDA toolkit, which in turn provides header files (e.g., for half-precision types) that will be required by Bitcomp at jit-compilation time. So, yes – CUDA must be around even if you don’t care about NVidia GPUs.