NERSC provides many popular profiling tools. Some of them are general-purpose tools and others are geared toward more specific tasks.
A quick guideline for performance analysis tools is as follows:
- Advisor: Intel Advisor provides two workflows to help ensure that Fortran, C and C++ applications can make the most of today's processors: vectorization advisor and threading advisor.
- Application Performance Snapshot (APS): Application Performance Snapshot (APS) is a lightweight open source profiling tool developed by the Intel VTune developers. Use APS for a quick view into a shared memory or MPI application's use of available hardware (CPU, FPU, and memory). APS analyzes your application's time spent in MPI, MPI and OpenMP imbalance, memory access efficiency, FPU usage, and I/O and memory footprint.
- CrayPat (also called Perftools): CrayPat is a suite of HPE/Cray profiling tools for a detailed analysis which can show routine-based hardware counter data, MPI message statistics, I/O statistics, etc; in addition to getting performance data deduced from a sampling method, tracing of certain routines (or library routines) can be performed for better understanding of performance statistics associated with the selected routines.
- Darshan: Darshan is a light weight I/O profiling tool capable of profiling POSIX I/O, MPI I/O and HDF5 I/O.
- HPCToolkit: HPCToolkit can be used to measure both the CPU and GPU performance of GPU-accelerated applications. It can measure CPU performance using asynchronous sampling triggered by Linux timers or hardware counter events and it can monitor GPU performance using tool support libraries provided by GPU vendors (current support extends to NVIDIA and AMD GPUs).
- LIKWID: LIKWID ("Like I Knew What I'm Doing") is a lightweight suite of command line utilities. By reading the the MSR (Model Specific Register) device files, it renders reports for various performance metrics such as FLOPS, bandwidth, load to store ratio, and energy.
- MAP: Arm MAP is a parallel GUI sampling tool for performance metrics; time series of the collected data for the entire run of the code is displayed graphically, and the source code lines are annotated with performance metrics.
- Parallelware Trainer: Parallelware Trainer is an Integrated Development Environment designed to facilitate the learning, usage, and implementation of OpenMP/OpenACC parallel programming, along with the ability to test the performance improvements of particular parallel implementations.
- Performance Reports: Arm Performance Reports is a low-overhead tool that produces one-page text and HTML reports summarizing and characterizing both scalar and MPI application performance.
- Reveal: Utilizing the HPE Cray CCE program library for source code analysis and performance data collected from CrayPat, Reveal helps to identify top time-consuming loops and provides compiler directive suggestions for inserting OpenMP parallelism.
- Roofline Performance Model: The Roofline performance model offers an intuitive and insightful way to compare application performance against machine capabilities, track progress towards optimality, and identify bottlenecks, inefficiencies, and limitations in software implementations and architecture designs.
- Timemory: timemory is a toolkit and suite of tools for performance analysis, optimization studies, logging, and debugging. Timemory is an excellent choice for roofline analysis, built-in performance analysis, and managing multiple third-party profiling APIs.
- Trace Analyzer and Collector: Intel Trace Analyzer and Collector (ITAC) are two tools used for analyzing MPI behavior in parallel applications. ITAC identifies MPI load imbalance and communication hotspots in order to help developers optimize MPI parallelization and minimize communication and synchronization in their applications.
- VTune: Intel VTune is a GUI-based tool for identifying performance bottlenecks and getting performance metrics.