intel / pti-gpu
Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing intel/pti-gpu in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewProfiling Tools Interfaces for GPU (PTI for GPU) Overview This repository describes the ways of collecting performance data for Intel(R) Processor Graphics and provides a set of samples that help to start. License Samples for Profiling Tools Interfaces for GPU (PTI for GPU) are distributed under the MIT License. You may obtain a copy of the License at https://opensource.org/license/MIT Supported OS • Linux *Windows support is under development* Supported Platforms • Intel(R) Processor Graphics Gen9 (formerly Skylake) and newer • Intel® Iris® Xe Graphics • Intel® Data Center GPU Flex Series • Intel® Data Center GPU Max Series *Some samples may have higher hardware requirements* Regularly Tested Configurations • Ubuntu 20.04 with Intel(R) Iris(R) Plus Graphics 655 Profiling Chapters • Runtime API Tracing • for OpenCL(TM) • for oneAPI Level Zero (Level Zero) • for OpenMP* • Device Activity Tracing • for OpenCL(TM) • for oneAPI Level Zero (Level Zero) • for SYCL/DPC++ • Binary/Source Correlation • for OpenCL(TM) • for oneAPI Level Zero (Level Zero) • Metrics Collection • based on oneAPI Level Zero (Level Zero) Metric API • based on Intel(R) Metrics Discovery Application Programming Interface • based on Performance Monitoring (PM) Register • Binary Instrumentation • based on Graphics Technology Pin (GT Pin) • based on OpenCL(TM) built-in intrinsics • Code Annotation • based on Instrumentation and Tracing Technology API (ITT API) • System Management • for oneAPI Level Zero (Level Zero) Profiling & Debug Tools • unitrace - unified tracing and profiling tool. In addition to Level Zero and/or OpenCL, this tool is capable of profiling software layers in the software stack, for example, SYCL and plugins, oneCCL, MPI etc., for scale-up and scale-out applications. It also supports profiling hardware metrics (including instruction-level EU stalls) and software events at the same time. • gpuinfo - provides basic information about the GPUs installed in a system, and the list of HW metrics one can collect for it; • instcount - prints GPU kernel assembly (GEN ISA) annotated by instruction execution count; • sysmon - Linux "top" like utility to monitor GPUs installed on a system; Sample Tools & Utilities • tools for OpenCL(TM), DPC++ (with OpenCL(TM) backend) and OpenMP* GPU offload (with OpenCL(TM) backend): • cl_hot_functions - provides a list of hottest OpenCL(TM) API calls by backend (CPU and GPU); • cl_hot_kernels - provides a list of hottest OpenCL(TM) kernels by backend (CPU and GPU); • cl_debug_info - prints source and assembly (GEN ISA) for kernels on GPU; • cl_gpu_metrics - provides a list of hottest OpenCL(TM) GPU kernels along with percent of cycles it was active, stall and idle (based on continuous metrics collection mode); • cl_gpu_query - provides a list of hottest OpenCL(TM) GPU kernels along with percent of cycles it was active, stall and idle (based on query metrics collection mode); • tools for Level Zero, DPC++ (with Level Zero backend) and OpenMP* GPU offload (with Level Zero backend): • ze_hot_functions - provides a list of hottest Level Zero API calls; • ze_hot_kernels - provides a list of hottest Level Zero kernels; • ze_debug_info - prints source and assembly (GEN ISA) for kernels on GPU; • ze_metric_query - provides a list of hottest Level Zero GPU kernels along with percent of cycles it was active, stall and idle (metrics are collected in *query* mode); • ze_metric_streamer - provides a list of hottest Level Zero GPU kernels along with percent of cycles it was active, stall and idle (metrics are collected in *streamer* mode); • tools for OpenMP*: • omp_hot_regions - provides a list of hottest parallel (for CPU) and target (for GPU) OpenMP* regions; • utilities: • dpc_info - prints information on available platforms and devices in DPC++; • ze_info - prints information on available platforms and devices in Level Zero; • ze_metric_info - prints the list of HW metrics one can collect with the help of Level Zero; • gpu_perfmon_set - allows to choose HW metric for collection in EU PerfMon register; Prerequisites • CMake (version 3.12 and above) • Git (version 1.8 and above) • Python (version 3.6 and above) • On Linux one has to be a part of the (Ubuntu 18 and below) or (Ubuntu 19 and above) user group to do any computations on Intel(R) Processor Graphics: • OpenCL(TM) ICD Loader and Headers • to use non-standard path to OpenCL ICD library one may add it into : • oneAPI Level Zero loader • Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver • Intel(R) Metrics Discovery Application Programming Interface • one may need to install package to build the library from sources • one may need to allow metrics collection for non-root users • if using the i915 kernel module (e.g. PVC): • if using the xe kernel module (e.g. BMG): • Metrics Library for Metrics Discovery API (Metrics Library for MD API) • Graphics Technology Pin (GT Pin) • Intel(R) oneAPI Base Toolkit • libdrm • on Ubuntu one may perform: More information of what is needed for particular sample can be found on sample description page. Build and Run In general, to build samples one need to perform the following steps (specific instructions for particular sample can be found on sample description page): To point out to specific headers and libraries one may use and options correspondingly, e.g.: Run instructions may vary from sample to sample significantly, so they are provided on particular sample description page. Testing There is a way to build and test all the samples in one command, e.g.: In case of failed tests, error output will be available in file. It's also possible to test an exact sample or a group of samples, e.g.: To run testing in debug mode one may use option, e.g.: The script creates directory inside each sample folder while testing. To remove all of these folders, use: **Tested software versions one may find in SOFTWARE file.** Known Issues • On RHEL IGA library may not…