5 min to read
Announcing Devito and DevitoPRO v4.8.7
Compounding gains with each release
Devito and DevitoPRO v4.8.7, brings a host of new features, enhancements, and optimizations to our high-performance computing (HPC) tools.
About open-source Devito
Devito is an invaluable tool for solving partial differential equations (PDEs) on structured grids, generating optimized C code from a high-level symbolic specification, facilitating HPC applications.
Key Features:
- High-productivity symbolic PDE solver specification with symbolic computation capabilities.
- Automatic code generation, producing finite-difference (FD) kernels, targeting a diverse range of architectures.
- Comprehensive, multi-level performance optimization.
About DevitoPRO
DevitoPRO builds upon Devito open-source’s foundation, adding advanced features and optimizations specifically for high-productivity HPC. This commercial extension targets seismic imaging and inversion workloads, supporting multiple hardware architectures, including AMD, Intel and Nvidia GPUs, to deliver enhanced performance and reduced development time.
Key Features:
- Additional DSL abstractions for developing RTM/FWI solutions.
- Enhanced compiler and toolchain for enterprise needs
- Advanced GPU acceleration using vendor-native languages (CUDA, HIP, SYCL)
Devito and DevitoPRO are released in lockstep. The new features and improvements in version 4.8.7, detailed below, collectively enhance the performance, usability, and scalability of both tools, making them indispensable for high-performance computing applications in fields such as seismic imaging and inversion.
New Features in Devito v4.8.7
API Enhancements
- Time Derivatives Expansion: Expand time derivatives in generated code for better clarity and consistency in computations.
- Improved Derivatives API: Cleaner abstractions for interpolants and interpolated derivatives to be compactly specified
- SparseFunction Improvements: Revamp sparse subfunction setup and sparse dimension handling to optimize grid operations.
- Improve examples: Clarify sparse function setup and interpolation examples.
Compiler Improvements
- Device-Aware Blocking: Implement device-aware blocking and refine related tests to improve performance on multiple architectures.
- Elementary Functions Optimization: Make code generation of elementary functions dtype-aware for improved type handling.
- ConditionalDimension Placement: Fix placement of
ConditionalDimension
within loop nests when used in combination with subdomains. - Memory Management: Restructure
MemoryAllocator
hierarchy to streamline memory allocation and deallocation processes.
MPI and Parallel Computing
- MPI Initialization and Finalization: Refine optional MPI initialization and finalization to better manage parallel execution environments.
- Threading Support: Initialize MPI with threading support to leverage multi-threading capabilities.
- C-Level MPI_Allreduce: Add support for C-level
MPI_Allreduce
to facilitate efficient distributed reductions. - Data Gathering Fixes: Fix data gathering for sparse functions to ensure accurate data collection in parallel computations.
- Halo Touch Sequentialization: Sequentialize halo touch operations to prevent race conditions in parallel environments.
- MPI0 Logging Level: Add
MPI0
logging level to restrict performance logging to rank 0, reducing overhead in multi-rank setups.
Tutorials and Examples
- ADER-FD: Add notebook demonstrating implementation of ADER-FD schemes.
- Tutorials for new API features: Add and update examples for enhanced FD API and sinc interpolation/injection.
Testing and Continuous Integration (CI)
- Parallel Marker Revamp: Revamp parallel markers in tests to improve test coverage and reliability in parallel execution scenarios.
Docker and Build Enhancements
- Docker Build Fixes: Fix Nvidia Docker builds and Intel OneAPI setups to streamline containerized deployments and compatibility.
- ARM Architecture Support: Build ARM base images to support ARM architecture, extending the range of supported hardware platforms.
Miscellaneous Enhancements
- Logging and Error Handling Improvements: Improve logging and error handling for MPI and compiler sniff operations to provide clearer diagnostics and error messages.
- Code Cleanup and Refactoring: Perform extensive code cleanup and refactoring, including polishing built-ins, updating cached properties, and removing unnecessary code to maintain a clean and efficient codebase.
- Dependency Updates: Drop support for Python 3.7 and update requirements for dependencies like
mpi4py
to ensure the project stays up-to-date with the latest software versions and standards.
New Features in DevitoPRO v4.8.7
Performance Optimizations
- Advanced Compiler Optimizations: Implement various compiler optimizations to reduce runtime and increase computational efficiency.
- Multi-GPU Scaling: Enhance scaling across multiple GPUs, allowing better utilization of hardware resources and significantly improving performance in large-scale simulations.
- Device-Aware Blocking: Add device-aware blocking to optimize memory and compute resource allocation for heterogeneous computing environments, including CPU and GPU.
API Enhancements
- Extended Symbolic Functionality: Introduce new symbolic operations and functions to support more sophisticated mathematical models and simulations.
- Boundary Conditions Support: Improve support for specifying boundary conditions and provide greater flexibility in defining simulation domains.
Parallel Computing and MPI
- Improved MPI Parallelism: Enhance parallel computing capabilities using MPI, ensuring better scalability on large clusters and more efficient parallel execution.
- C-Level MPI_Allreduce: Add support for C-level
MPI_Allreduce
operations to facilitate efficient distributed reductions, improving the performance of parallel algorithms.
Debugging and Profiling Tools
- New Debugging Tools: Introduce new debugging tools and logging capabilities, making it easier to troubleshoot and optimize simulations.
- Performance Profiling: Integrate performance profiling tools to help developers identify and address bottlenecks in their code, ensuring optimal performance.
Compiler Improvements
- Memory Management: Restructure the
MemoryAllocator
hierarchy to streamline memory management processes, reducing overhead and improving performance.
Miscellaneous Enhancements
- Preconditioning Techniques: Add new preconditioning techniques to improve the convergence rates of iterative solvers, making simulations more efficient and accurate.
- Code Cleanup and Refactoring: Perform extensive code cleanup and refactoring, including polishing built-ins, updating cached properties, and removing unnecessary code, ensuring a clean and maintainable codebase.
- Dependency Updates: Update software requirements and dependencies to ensure compatibility with the latest versions and standards, including dropping support for outdated Python versions and updating key libraries.